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1.  Introduction 


Maximum  likelihood  estimates  of  the  parameters  in  Gaussian  time  series  maximize 
the  likelihood  function 

(1.1)  £(«)  =  (2*)-TI2\£\-'l'>  exp  , 

where  y  is  the  vector  of  observations,  £y  =  0,  £yy'  =  X1  =  X(0).  Exact  maximum  likeli¬ 
hood  estimation  is  complicated  because  even  for  standard  models  the  needed  determinants 
and  inverses  are  either  not  known  in  closed  forms  or,  if  they  are  known,  they  involve  com¬ 
plicated  parametric  functions.  Thus  only  for  the  first-order  autoregressive  model  are  the 
explicitly  forms  of  the  maximum  likelihood  estimators  of  its  two  parameters  known  (Hasza, 
1980). 

One  procedure  that  is  often  used  is  to  operate  mathematically  with  the  likelihood 
function,  so  that  it  can  be  evaluated  numerically  at  any  point  of  the  parameter  space, 
and  then  to  optimize  the  function  by  varying  values  of  the  parameters,  using  an  efficient 
computer  program.  For  example,  the  IMSL  Library  (1979)  package  of  Fortran  subroutines 
uses  a  “modified  steepest  descent  algorithm”  to  compute  estimates  of  the  parameters  of 
ARIMA  models;  the  statistical  package  BMDP  (1985)  uses  “Gauss-Marquardt  methods” 
to  perform  linear  and  nonlinear  estimations.  One  idea  in  this  area  by  Box  and  Jenkins 
(1976)  was  the  “backcasting”  procedure  to  evaluate  the  approximate  likelihood  function 
in  some  time  series  models.  Useful  suggestions  have  been  the  Cholesky  decomposition  of 
the  covariance  matrix  and  “Woodbury’s  formula,”  as  in  Phadke  and  Kedem  (1978),  for 
example.  Ansley  (1979)  studied  several  approaches  and  proposed  a  new  algorithm;  he 
reviewed  earlier  work  by  Newbold  (1974),  Dent  (1977),  Ali  (1977),  Osborn  (1977),  and 
Hillmer  and  Tiao  (1979).  He  showed  the  equivalence  of  the  proposals  by  Newbold  (1974) 
and  Dent  (1977),  and  related  this  approach  to  Ali  (1977).  See  also  Nicholls  and  Hall 
(1979). 

Another  approach  is  to  derive  an  iterative  procedure.  Some  iterative  procedures  use 
the  likelihood  equations  deduced  by  setting  the  derivatives  of  the  likelihood  function  equal 
to  0  to  obtain  a  procedure  of  the  form  0i  =  g(0i- 1)  for  some  function  g  depending  on  y, 
where  0{,  i  =  1,2,...  are  successive  numerical  values  of  the  estimates,  with  do  as  a  starting 
value.  Anderson  (1975,  1977)  considered  severed  procedures  for  exact  maximum  likelihood 
estimation,  and  some  approximations.  Godolphin  and  de  Gooijer  (1982)  presented  a  pro¬ 
cedure  for  the  first-ord^  moving  average  model.  In  general,  deriving  iterative  procedures 
tends  to  involve  more  mathematical  elaborations  of,  and  more  knowledge  about  the  like¬ 
lihood  function,  than  the  approaches  mentioned  in  the  previous  paragraph.  A  general 
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iterative  procedure  for  estimation  purposes  is  the  so-called  “EM-algorithm”  (Dempster,  et 
al.,  1977),  proposed  initially  for  some  missing  value  statistical  problems;  in  our  context 
the  initial  values  can  be  thought  of  as  missing  values,  and  when  applicable  the  EM  algo¬ 
rithm  will  suggest  a  sequence  of  conditional  expectation-optimization  steps  for  the  iterative 
computation  of  the  maximum  likelihood  estimates.  Another  approach  involves  Kalman  fil- 
tej.Mg  techniques;  seo  fo*-  example,  Harvey  (1981).  The  connection  of  the  present  work 
with  the  EM-algorithm  and  with  Kalman  filtering  will  be  considered  briefly  in  Section  8. 

The  purpose  of  this  paper  is  to  consider  in  detail  iterative  procedures  for  exact  max- 
imum  likelihood  estimation  in  the  first-order  Gaussian  moving  average  model.  Section  2 
introduces  the  model;  Section  3  deeds  in  general  with  iterative  procedures,  and  the  spe¬ 
cific  procedures  are  derived  in  Section  4.  Sections  5,  6,  and  7  contain  the  evaluations  of 
quadratic  forms  and  traces  in  the  time  and  frequency  domains.  Section  8  contains  various 
comments  on  the  procedures.  Section  9  contains  evaluations  of  the  numbers  of  operations 
needed  to  compute  certain  traces  and  quadratic  forms.  Finally  Section  10  considers  the 
approach  of  Box  and  Jenkins  (1976)  to  the  estimation  problem. 


2.  The  First-Order  Moving  Average  Model 


The  Gaussian  moving  average  process  of  order  1  with  mean  0,  denoted  here  by  MA(1), 
is  defined  by 


(2.1) 


yt  =  ut  +  aut-i,  t  =  ...,-1,0,1,...  , 


where  the  yt  are  observable,  the  axe  unobservable  independent  normal  random  variables 
with  €ut  =  0,  £tt2  =  <72,  0  <  a2  <  oo,  and  a  and  cr2  are  parameters.  The  Gaussian  moving 
average  is  a  stationary  stochastic  process  for  any  value  of  a. 

The  autocovariance  (or  covariance)  sequence  of  the  process  is 

cr,  =  cr2(l  +  a2),  5  =  0, 

(2.2)  =er2a,  W  =  1, 

=  0,  |*|  >1- 


The  autocorrelation  (or  correlation)  sequence  of  the  process  is 


(2.3) 


Ps  =  1, 

_  O 

—  T+a7’ 

=  0, 


s  =  0, 

M  =  i. 
1*1  >  i- 
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For  convenience,  we  write  p  for  p\ .  The  covariance  sequence  satisfies  the  Fourier  inversion 
formula 


-  /: 


e'x*f(\)d\ , 


where  /(A)  is  the  spectral  density  of  the  process,  given  by 


2  2 

(2.5)  /(A)  =  —  |e,A  4  a |2  =  —(1  4  o2  4  2qcos  A), 

Z7T  Z7T 


—  7 r  <  A  <  7T. 


If  yi,...,yr  is  a  sample  from  (2.1),  y  =  {yi  ,■■■■,  yr)'  has  a  multivariate  normal 
distribution  with  expectation  Ty  =  0  and  covariance  matrix  £yy'  =  X  =  (<T|,-_jj). 

Let  us  introduce  the  T  x  T  matrices  P  and  R  (the  correlation  matrix)  defined  by 


(2.6) 


X  =  <r2  P  =  cr2(l  4-  a2)R  =  aoR. 


We  note  that  X,  P,  and  R  can  be  written  as  linear  combinations  of  the  identity  matrix  I 
and  the  matrix  G  that  is  symmetric,  has  l’s  along  the  diagonals  immediately  above  and 
below  the  main  diagonal,  and  0’s  elsewhere.  In  this  notation 


(2.7)  £  =  a0I  +  <?iG  =  <r2(l+a2)I  +  a2aG, 

(2.8)  P  =  (1  4-  o2)/  +  qG, 

(2.9)  R  =  I  +  -^-G  =  /  +  pG. 

1  4"  o4 

For  any  a  and  T,  X,  P,  and  R  are  positive  definite,  the  first  because  we  also  assume 
that  a2  >0.  As  a  function  of  p,  R  is  positive  definite  for  —a  <  p  <  a,  where  a  = 
l/{2cos[7 r/(T  4- 1)]};  see  Anderson  and  Takemura  (1986). 

The  likelihood  function  can  be  written  as  a  function  of  a  and  a2  as 

(2.10)  £*(a,a2)  =  (2i)-T/2|r|-1'2exp(-l»'r-1y} 

(2.11)  =  (2x)-^2(c2)-T'2|PrI/2  exp  j  . 

Instead  of  operating  with  (2.11)  for  purposes  of  maximum  likelihood  estimation,  we 
can  separate  the  analysis  into  two  parts:  we  maximize  (2.11)  with  respect  to  a2  at 

(2.12)  £2  =  ^y'P-'y, 
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and  then  substitute  n2  =  a2  in  (2.11)  to  derive  the  “concentrated  likelihood  function,” 
which  is  a  constant  times  the  square  root  of 


(2.13) 


n*(ct) 


1 

IPHsAP-1!/)7" 


Since  p  =  a/(  1  +  a2)  =  (1/q)/[1  +  (1/a)2],  the  likelihood  function  attains  all  possible 
values  on  the  set  for  which  |a|  <  1,  0  <  <r2  <  oo.  Hence,  without  loss  of  generality  we 
restrict  attention  to  this  set.  Note  that  |a|  <  1  is  the  condition  for  invertibilitv,  that  is,  to 
express  (2.1)  as  an  infinite  autoregression. 

In  terms  of  ctq  and  p  the  likelihood  function  can  be  written  as 


(2.14)  L(<r0,p)  =  (2jt)  T/2(<70)  T,2\R\  1/2  exp  y'.R  *yj  . 


If  the  analysis  is  separated  into  two  parts,  we  maximize  (2.14)  with  respect  to  ctq  at 


(2.15)  S„  =  i  y'R-'y, 

and  then  maximize  with  respect  to  p  the  function 


(2.16) 


n(p)  = 


1 


Note  that  n*(a)  =  n[p(a)],  where  p(a)  =  a/(l  +  a2). 

For  |a|  <  1  and  |p|  <  |  the  following  three  pairs  provide  alternative  equivalent 
parametrizations  for  (2.1):  a  and  cr2,  o-q  and  <7i,  p  and  <tq.  For  example, 


(2.17) 


1-  y'l—  4(<T1/<70)2  =  l-y/1-  4p2 
2(<Ti/<70)  2p 


Hence,  for  purposes  of  maximum  likelihood  estimation  we  can  operate  with  (2.10)  as  a 
function  of  <70  and  cn,  with  (2.11)  as  a  function  of  a  and  <72,  or  with  (2.14)  as  a  function 
of  p  and  o 0 .  Similarly,  in  (2.13)  we  operate  with  a  function  of  a,  and  in  (2.16)  with  a 
function  of  p.  The  relationship  between  the  two  parametrizations  will  be  studied  in  more 
detail  in  Section  3. 

For  further  details  about  the  moving  average  model  see,  for  example,  Anderson  (1971 ) 
or  Anderson  and  Mentz  (1980). 
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3.  Some  Approaches  to  the  Iterative  Estimation  by  Maximum  Likelihood 


The  method  of  maximum  likelihood  proposes  to  estimate  the  parameters  by  maxi¬ 
mizing  (2.10),  (2.11)  or  (2.14);  alternatively,  we  can  use  (2.12)  and  (2.13).  or  (2.15)  and 
(2.16).  A  basic  difficulty  comes  from  the  complicated  nature  of  the  parametric  functions 
that  are  involved.  Let  At  =  |P|;  then  as  functions  of  a  and  p  we  have  for  |a|  <  1. 


(3.1) 

(3.2) 


T  =  0.1 . 


respectively.  These  results  can  be  verified  by  showing  that  At  satisfies  the  homogeneous 
difference  equation 


At  -  (1  +  q2  )At-i  +  o2  At— 2  =  0,  T  =  2,3,..., 


with  A0  =  1,  Ai  =  (1  —  o4)/(l  —  a2).  See  Anderson  (1971),  Section  6.7,  for  example.  If 
p,ji  denotes  the  z,  j-th  component  of  P_1,  then 

(3-3)  j  >i, 

so  that  as  a  function  of  a, 


(3.4) 


(~ay- 


(1  _  q2*)(i  _  a2(T  ;+l)) 
(l-a2)(l-a2(T'+1))  ’ 


See  Shaman  (1969). 


j  >  *• 


3.1.  Four  aspects  of  maximum  likelihood  estimation 
I.  Likelihood  vs.  concentrated  likelihood  functions 

As  indicated  in  Section  2,  operating  with  (2.10)  or  (2.11)  in  terms  of  a  and  a2  is 
mathematically  equivalent  to  operating  with  (2.12)  and  with  the  concentrated  likelihood 
function  or  with  (2.13);  similarly,  operating  with  (2.14)  in  terms  of  cr0  and  p  is  mathemat¬ 
ically  equivalent  to  operating  with  (2.15)  and  (2.16),  in  the  sense  that  the  solutions  are 
the  same.  However,  for  the  same  set  of  parameters,  the  estimating  equations  are  not  the 
same  for  the  likelihood  and  concentrated  likelihood  functions. 


II.  a  and  a2  vs.  p  and  a0 

We  have  a  choice  of  parameters  to  consider.  As  indicated  in  Section  2,  the  three  pairs 
that  we  discussed  provide  equivalent  alternative  parametrizationsfor  the  model  (2.1).  Since 
the  maximum  likelihood  estimation  procedure  is  invariant  under  such  type  of  transforma¬ 
tions  of  the  parameters  it  follows  that  from  a  mathematical  point  of  view  it  is  immaterial 
with  which  set  we  choose  to  operate.  However,  different  sets  of  parameters  may  lead  to 
different  estimating  equations. 


III.  Time  vs.  frequency  domains 


We  presented  our  results  so  far  (except  for  (2.4)  and  (2.5))  in  the  time  domain.  We 
can  consider  the  effect  of  a  Fourier  transforma uon.  Let  K  be  the  orthogonal  T  x  T  matrix 
with  components 


(3-5) 


2  .  nj  k 

sin 


j,k  =  1 . T, 


t+  i  r  +  i’ 

and  let  D  be  the  diagonal  matrix  with  diagonal  elements 


(3.6) 

Then 


dj  —  2  cos 


T+  r 


j  =  1,...,T. 


(3.7) 


K'K  =  KK'  =  J,  K'GK  =  D. 


We  have  a  way  to  diagonalize  the  matrix  G  appearing  in  (2.7),  (2.8).  and  (2.9).  If  y  =  Kz, 
then  z  —  K'y ,  that  is, 


(3.8) 


j  =  i 


*hen  z  is  multivariate  normal  with  £z  =  0  and 


(3.9)  £zz'  =  a0I  +  cr\D  =  oq{I  +  pD). 

This  then  provides  an  alternative  approach  that  we  may  call  a  “frequency  domain 
approach”:  any  expression  in  terms  of  G  can  be  translated  into  an  expression  in  terms 
of  D ,  and  any  method  formally  presented  in  terms  of  y  can  be  translated  into  a  method 
presented  in  terms  of  z. 
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IV.  Scoring  vs.  N  ^wton-Raphson 

To  maximize  the  L  or  n  functions  (or  the  L*  or  n*  functions)  we  have  availahle  two 
procedures  based  on  a  Taylor’s  expansion.  We  illustrate  this  with 

(3.10)  logn(p)  =  -log  \R\  -  T\og{y'R~1y) 


coming  from  (2.16).  The  expansion  of  its  derivative  with  respect  to  p  around  a  value  po  it 


(c.ii : 


-j-  log  n(p)  =  -j—  log  n(p) 
dp  dp 


p-p  0 


j 2 

+  (P-  Po)^\ognip) 


ii(p ,  Po  )■ 


P— Pc 


where  R{p,  po)  is  a  remainder.  The  estimating  equation  is  obtained  by  setting  this  deriva¬ 
tive  equal  to  C. 

The  Newton-Raphson  procedure  consists  in  replacing  the  remainder  by  0  and  sett’ng 
p  =  pb)  and  pu  =  The  iterative  procedure  is  then 

(3.12)  loS77(p)j  P"l)  =  j^lcgn(p)  +  {“^2  logn^} 

where  all  derivatives  are  evaluated  at  p  — 

In  the  method  of  scoring  the  second  derivative  is  replaced  by  its  expectation,  where 
the  random  vector  y  is  taken  with  distribution  having  parameter  p  =  po- 

About  these  alternative  approaches  we  note  that  the  dichotomies  likelihood  vs.  con¬ 
centrated  likelihood  functions  and  scoring  vs.  Newton-Raphson  procedures,  arise  from 
theoretical  considerations.  The  other  dichotomies,  a  and  a2  vs.  p  and  cr0  and  time  vs. 
frequency  domains,  are  motivated  on  computational  grounds. 

From  this  analysis  it  follows  that  to  estimate  the  parameters  of  the  model  (2.1)  by 
maximum  likelihood  under  normality,  we  have  sixteen  alternative  approaches,  which  may 
lead  to  different  iterative  procedures.  Some  of  these  have  already  been  presented  in  the 
literature,  as  will  be  noted  in  Section  4. 

Anderson  (1977)  emphasized  these  dichotomies,  while  operating  with  the  likelihood 
function. 


3.2.  Relations  between  two  parametrizations 


Since  cr0  =  (1  +  a2)cr2  =  aQ{a,a2),  say,  and  p  =  a/(l  +  a2)  =  p(a),  say, 


(3.13) 


L*(a,  a2)  =  L[rr0(a,a2),r[<x)}- 
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In  L(ao,p)  the  range  of  p  is  —  a  <  p  <  a,  where  a  =  l/{2cos  [7 r/(T  +  1)J  }•  A  maximum 
of  L(o0,  p)  occurs  at  do  >  0  and  —a  <  p  <  a  (with  probability  1).  These  values  satisfy  the 
likelihood  equations 


(3-14) 


d  log  L(aQ,p) 


(3.15) 


d  log  L(a0,P ) 


For  T  —  2  the  solution  to  (3.14),  (3.15)  is  unique,  but  for  T  >  2  there  may  be  multiple 
solutions  to  (3.14),  (3.15).  The  maximum  likelihood  estimates  do ,  p  are  the  solution  that 
makes  £(<ro,p)  largest  within  the  range  oq  >  0,  —a  <  p  <  a  . 

In  £*(a.cr2)  the  range  of  a  is  — 1  <  a  <  1,  and  (with  probability  1)  the  maximum 
occurs  at  —  1  <  a  <  1  and  d 2  >  0.  These  values  satisfy  the  derivative  equations 


(3.16) 


d  log  £*(q,  <t2)  d  log  £(<7q,  p)  d°o  5  log  £(a0,  p)  dp 


d  log  £(<r0,p)o  2  ,  ^  log  £(cr0,  p)  1-a2 

- ^ - 2o<’  + - Tf, - (ITT  )*  =  °' 


(3.17) 


d  log  £*(o, a2)  _  d  log  L(a0,p)  da0 
do 2  do0  do2 


d  log  £(<70, p) 


(1  +a2)  =  0. 


If  (3.14)  is  solved  for  oq  =  oo(p)  as  given  in  (2.15)  and  substituted  back  into  (3.15), 


we  obtain  |  times 


(3.18) 


d  log  n(p) 
dp 


y'R-'GR-'y 

y'R'ly 


=  —  tr  R-'G  +  T  V  -  -  =  0. 


If  (3.17)  is  solved  for  o2  —  cr2{a)  and  substituted  back  into  (3.16),  we  obtain  |  times 


(3.19) 


d  log  n*(a) 


=  {  -trR-'G  +  T 


y'R-'GR-'y \  1  -  o2 

y'R~ly  )  (1  +  a2)2 


v  '  da  V  y'R~ly  i  +  <>2)2 

where  R  =  I  +  p(a)G.  Note  that  (3.19)  has  the  solutions  a  —  1  and  —1.  Any  other 

solution  o*  to  (3.19)  yields  a  solution  p*  =  p(a*)  to  (3.18).  However,  a  solution  p*  to 

(3.18)  yields  a  solution  to  (3.19)  only  if  -5  <  p*  <  \  because  then  p*  =  p(a)  can  be 

solved  for  the  real  root  _ 

.  1  -  v/l  -  4p*2 


which  lies  in  the  interval  [—1,1]. 
Consider 


(3.20) 


n{p)  = 


my'R-'yf  iE.,0 +  />*)(£ 


+  pdt) 


T- 1 


T ' 


xA  ris^<( 1 + 

»  j  \ 

As  p  — ►  —  l/d^  =  1/dj  =  a,  the  numerator  approaches  0  and  the  denominator  approaches 

thus  n(p)  — *  0  as  p  — >  ±a.  The  derivative  of  log  n{p)  is 


(3.21) 


d  log  n(p)  _  ^  dj _ T _  dtz\ 

Z—*  1  4-  nd.  \-^T  zj  Z— > 


dp 


Setting  (3.21)  to  0  yields 

T 

(3.22) 


ST 1  +  £L 

dtz\ 


\+pdt 


jr[  (l  +  pdt)2 ' 


T  T  2 

_  y  -A-  y  _ : ej—  =  o. 

1  4-  rt/V.  '  1  4-  nd . 


f^(l  +  pdt)2  1  +  pd„  l  +  pdt 

T 

Multiplication  by  |i2|2  =  n  (1  +  pdt)2  gives 

i= l 

T  T  T 

(3.23)  T ^  z2dt  H(1  +  pdrf  ~Y,d*  IK1  +  Pdg)  yz]H(i  +  pdr)  =  o. 

1=  1  rjit  3  =  1  9^3  1=1  rjit 

Note  |i2|2  is  of  degree  2 T  —  2  if  T  is  odd  and  2T  if  T  is  even  (because  d(r+ 1)/2  =  0  f°r  r 
odd). 

If  T  is  even,  the  first  polynomial  in  (3.23)  is 
T 

(3.24)  T  ^  dt  JJ(1  +  pdr)2 

t=  1  r^f 

T  T 


—  t zt  (i  ~  p^t)2  ~  ^T+i-i (i  +  p^t)2]^i  Jlu — 


1=1 


—  T —  Zj’jrX_t  ~  2pdt(z 2  +  z^+i-t)  +  (zt  ~  zt+i  —t)P* nd  ~  P2d2r) 


1=1 


T 

=  TyZJ_Ul±lzl  | G[2  p*(T-l)  _2TyZA±j5±l=l  ,G|2  p2T— 3  _j_  .  . 
1=1  dt  1=1 

=^{tf 

^  t=l  *  t=l  *  ^ 


r  =  l 
r#t 


£  -v2  I  ~2 
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since  dt  ^  0,  t  —  1, . . , ,  T.  For  T  even,  |G|  =  (—l)7"/2.  If  T  is  odd,  the  first  polynomial  is 

T  —  1  T  —  1 

(3.25)  T  [*?(1  -  P^)2  -  4+i-tU  +  Pd02]^  lid- P24)2 

t=l  L  *■  =  ! 

r- 1 

2  r 

=  r  ^  (d2  -  4+i-<)  —  2 pdt{Zf  +  4+i-<) 

t=i 

r-i  ( 

+  p2d2tizl  --  4-t-i_t)]d<  II  d  —  p2^2)2 


= Tp2(T_2)  ]T(z2  -  4+1 n  < + . . 


T  2  T 

s — '  rr 


■4r  E  i  n  «e+- 


=  ^-<T||(T+iy  + 


If  T  is  even,  the  first  factor  in  the  second  term  is 


1  2  2 

(3.26)  ]T  d9  !](!  +  pdq)  =  2  [d.(l  -  pd.)  -  d3(  1  +  pd,)]  fj(l  -  p2d2,) 


«=  1  q^a 


T  X 

2  2 


*=1 

«#• 


=  T\G\pT  1  +  coefficient  x  pT  3  +  . . . 

=  (—l)^TpT~l  +  coefficient  xpT‘3  +  ...  . 


If  T  is  odd,  the  first  factor  in  the  second  term  is 


(3.27)  ^[d,(i-pds)-da(i+pds)]  no-p2^) 


T-l  T-l 


*=i  «=» 


10 


i 

=  (T  -  1)  JJ  (-d2a)pT~2  -f  coefficient  x  pT~A  + 


T  4-  1  T— 2  rt:  •  j.  T— 4 

V  «  ■*  ^  I  /»^£»TT1  /»1  Atlf  N/  /I 


=  (  — —  1) — - —  pT  2  +  coefficient  xp  +... 

=  — -  pT~ 2  +  coefficient  x  pT~A  +  . . .  . 

We  use  the  fact  that  dt  is  the  coefficient  of  pT~l  in  }R\. 

If  T  is  even,  the  second  factor  is 

T  Z 

(3.28)  no +/?dr)  ~  i  [■s’*  (1  —  pdt)  +  4+i-t  (1  +  pdt) j  J^J(1  P2d\) 


t=  1 


t  ,  t 

=  ^  [2?  +  2r+i-<  -  pdt{zt  ~  4+i-t)]  n^1  ~  P2<%) 
*»»  Tr7\ 

T  Z 

=  "  J2(zt  ~  4+i-t)dt  f[{-dl)pT'1 


+  ^(zt  +  4+1-t)  \[{~d2r)pT  2  + 


=  y  z<  ~  |G|  PT~l 

fe  dt 
$  -2  .  ,2 


-  E z* +  S’*1-  igi  d''J  +  --- 


_i_  .2 


T  2 

T  r — '  Zt  T—2 


If  T  is  odd,  the  second  factor  is 


(3.29)  ^  [z‘  +  zT+i-<  “  -  4+i-t)]  HO"  p2d^ 


+  zh±  no-  P2<1  r)  =  n  (“4)4±I  PT 


/  ,  AI-=I  r+1  2  T— 1  , 

+  ...  =  (-1)  *  — y-Sr±i  P  + 
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For  T  even,  the  left-hand  side  of  (3.23)  is 


(3.30) 


t= 1  ' 


For  T  odd,  the  left-hand  side  of  (3.23)  is 

\2 


(3.31) 


(T  —  1)(T  +  1)  2  2T—3  , 

- -  ‘I±lP  +■••  • 


With  probability  1  the  derivative  equation  is  of  degree  2 T  —  3  for  every  T. 

The  coefficients  of  the  polynomial  in  p  in  (3.23)  are  linear  functions  of  Zj,...,z^, 
which  are  independent  variables,  hence  the  set  of  variables  has  a  density.  The  roots 
being  simple  is  the  complement  to  some  roots  being  multiple.  The  latter  event  is  described 
by  some  algebraic  relations  among  the  coefficients  and  hence  among  z\ , . . . ,  z\.  The  event 
has  Lebesgue  measure  zero  and  hence  probability  zero.  Hence,  with  probability  1  the 
degree  of  the  polynomial  equation  (3.23)  is  odd,  the  number  of  real  roots  is  odd,  and  they 
are  distinct. 

At  each  root  the  derivative  is  0;  hence,  a  local  maximum  or  minimum  occurs  at  the 
point;  further,  n(p )  >  0  and  n{p)  — ►  0  as  p  — ►  ±a.  Thus  the  number  of  maxima  is  one 
more  than  the  number  of  minima. 

The  relative  maxima  of  n(p)  occur  fo”  p  6  (— a ,  a).  The  relative  maxima  of 


(3.32) 


occur  for  a  €  [—1, 1]-  Let  p\  <  . . .  <  px  be  the  values  of  p  for  which  maxima  occur.  The 
probability  that  one  of  these  values  is  ±|  is  0  (by  the  above  argument);  those  events  can 
be  ignored.  If  —  ^  <  pj  < 


(3.33) 


a  = 


1  - 


2  Pj 


is  real  and  yields  a  relative  maximum  of  n*(a).  If  \pj\  >  j ,  the  solution  to  (3.33)  is  not 
real  and  hence  does  not  correspond  to  a  maximum  of  n*(a).  Hence,  the  number  of  maxima 
with  respect  to  a  can  be  smaller  than  with  respect  to  p.  Anderson  and  Takemura  (1986) 
showed  that  the  root  3  =  1  yields  a  relative  maximum  of  n*(a)  if  dlog  n(p)/dp  >  0  at 
p  =  I.  (Alternatively,  the  root  3  =  —  1  yields  a  relative  maximum  if  the  derivative  is 

negative  at  p  =  —  £). 
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Let  tji  <  ■  ■  ■  <  tjk-  1  be  the  values  of  p  for  which  minima  of  n(p)  occur.  Then 
(3.34)  Pi  <  m  <  ■■■  <  rlK—i  <  PK - 

If  Vk- 1  <  \  <  pk  for  some  k(k  =  2, . . . ,  A'),  then  dn/dp  >  0  at  p  =  \  and  there  is  a  relative 
maximum  of  n*(a )  at  a  —  1.  If  Pk-i  <  \  <  Vk  then  dn/dp  <  0  at  p  =  |  and  a  =  1  gives 
a  minimum. 

Let  ai  <  •  •  •  <  qj  be  the  values  of  a  giving  relative  maxima  of  n*(o).  Each  satisfies 
(3.33)  except  possibly  Qi  =  —1  and/or  aj  =  1.  The  last  may  occur  only  if  pk  >  Thus 
J  <  K.  The  maximum  likelihood  estimate  of  a  is  that  one  of  £*i, . . .  ,a  j  for  which  n*(oy ) 
is  greatest. 

The  maximum  likelihood  estimate  of  p  is  that  one  of  pi,...  ,Pk  for  which  n(pj )  is 
greatest.  If  that  p}  is  in  (— |,  |),  then  the  maximum  likelihood  estimate  of  n  is  given 
by  (3.33)  for  that  pj.  If  the  maximizing  pj  is  outside  (  — |,  |)  (and  hence  pi  <  —  |  or 
Pk  >  §),  the  maximizing  a  may  be  a  solution  to  (3.33)  for  another  pj  or  it  may  be  —1  or 
1.  If  n(|)  >  n(pj)  for  every  p}  £  (— |,  |)  and  n(|)  >  n(- 1)  the  maximizing  a  is  a  =  1;  if 
n(-i)  >  n(pj )  for  every  p}  £  |)  and  n(-|)  >  n(\),  the  maximizing  q  is  a  =  -1.  If 

n(i)  <  n(pj)  and  n(- 1)  <  n(p})  for  some  j,  then  the  maximizing  a  is  (3.33)  for  some  j. 

Anderson  and  Takemura  (1986)  evaluated  the  probability  that  a  —  1  and  alternatively 
a  =  —  1  yield  relative  maxima.  The  probability  that  o  =  1  or  S  =  —  1  is,  of  course,  less 
than  the  probability  that  a  relative  maximum  occurs  at  o  =  1  or  a  =  -1,  respectively. 


4-  Iterative  Procedures  Derived  from  Newton-Raphson  and  Scoring  Methods 

In  this  section  we  derive  several  iterative  procedures  to  estimate  the  parameters  of 
model  (2.1),  with  emphasis  on  the  estimation  of  p.  After  some  preliminaries  and  the 
introduction  of  some  notation,  we  operate  successively  in  the  time  and  frequency  domains. 

4.1.  Notation  and  general  rules 

To  estimate  by  maximum  likelihood  the  covariances  of  the  moving  average  part  and  the 
coefficients  of  the  autoregressive  part  of  an  ARMA(p,q)  model,  Anderson  (1977),  Section 

4.1,  derived  the  equations  that  in  general  correspond  to  the  iterative  procedures  in  the 
Gaussian  case.  In  the  case  of  a  MA(q)  model  these  equations  are 

(4.1)  A, -i (a,  -  &,-i)  =  «>— i, 
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where  =  (<7q\<t^,.  . . ,  dq^)'  is  the  vector  of  estimated  covariances  at  step  i,  Ai-i  is  an 
estimate  of  the  information  matrix  for  the  covariances,  and  Sj_i  is  composed  of  estimates 
of  d\og  L/dcrj,  j  =  0, 1, ...  ,q.  From  (4.1)  we  deduce  the  iterative  procedure 

(4.2)  Ai-iffi  =  ?,_i  +  Ai- \Bx-\  =  ft_!, 


that  for  the  MA(1)  model  is  written  in  terms  of  components  as  follows: 


(4.3) 


A 


(.-i) 

oo 


T(«-i)  A(«-D 

A10  A11 


Solving  for  and  and  using  p^  =  /d^,  we  obtain  the  iterative  procedure 


/KO  = 


(aa\  -(«•)  _ 

(4. i  A01  rl  A11  r0  j  P  —  A10  r0  A00  rl 


The  needed  A and  r*  can  be  evaluated  by  using  the  scoring  or  Newton-Raphson 
procedures,  as  will  be  shown  next. 

The  derivation  of  these  procedures  will  lead  to  certain  quadratic  forms  and  traces  that 
we  now  consider.  We  use  the  fact  that  if  A  and  B  axe  square  matrices  with  B  nonsingular, 
then  AB  =  BA  implies  that  AB~ 1  =  B~l  A.  We  use  this  result  with  A  =  G  and  B  =  R, 
for  example.  From  (2.9)  we  use  JR  =  I  +  pG,  and  hence, 


(4.5)  RG  =  ( I  +  pG)G  =  G  +  pG2  =  G(I  +  pG)  =  GR. 


We  now  define  a  set  of  quadratic  forms  by 
(4.6)  qjk  =  y'R~U+1)Gky,  j  =  —1,0,1,...;  *  =  0,1,2,..., 


and  a  set  of  traces  by 

(4.7)  tjk=trR-jGk,  i,*  =  0, 1,2,.... 

We  note  that  qjk  is  a  random  variable,  a  function  of  y,  and  that 

(4.8)  Sqjk  =  £y'R~u+1)Gky  =  £  tr  y'R-(j+1)Gky  =  tr  R-^+1)Gk£yy' 

=  tr  R-^+1)Gkr  =  tr  R-{j+1)Gk<r0R  =  <70  tr  R~jGk 

—  &otjki 

where  we  used  (2.6)  and  the  fact  that  G  and  R  commute. 

In  our  operations  we  shall  find  quadratic  forms  like  y'R~iGR~iy ,  y' R~'GR~l  Gy, 
etc.,  and  traces  like  tr  R~2G,  tr  R~lGR~* ,  etc.,  so  that  frequent  use  of  this  commutative 
property  will  be  made. 
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4.2.  Time  domain  considerations 

4.2.1.  Procedures  based  on  the  likelihood  function 


Iterative  Procedure  1  (Time,  Likelihood  Function,  Scoring) 


In  Anderson  (1977)  Section  4.2,  it  is  shown  that  (4.1)  above  leads  to  a  system  that  in 
the  case  of  a  MA(1)  model  is 


(4.9) 


tr  +tr  E^  Ga\']  =  y'E^y, 


tr  +  tr  E^GE^Ga^  =  y'E~\GE~\y. 

These  equations  can  also  be  written  with  R  instead  of  E.  using  the  fact  that  E  =  cr0R, 
and  Oq2  cancelled  throughout.  We  can  then  write  the  resulting  linear  system  as 


(4.10) 


nr11  JVf'N  pin  =  Air” 

,4‘r1’  4r‘vU°/  Ur" 


Solving  this  system  for  <7q^  and  and  using  the  definition  of  p^'\  we  obtain  the  following 
iterative  procedure: 

(4.ii)  -  nr,,s‘.r,)}  -  ®r,)sl‘r1)  - 

Iterative  Procedure  2  (Time,  Likelihood  Function,  Newton-Raphson) 


Equations  (4.2)  in  Anderson  (1975)  become,  in  the  case  of  a  MA(1)  model, 

(v'ET- ,v  -  \ tr  ^rii)so! 

(4.12)  +  (v'S-^gs-^v  -  \  tr  r-i.cf)?;0  =  jw’fir-iw  -  *>■  £->„ 

(y'S-^GS-^y  -  j  tr  ^GZ^^ 

-  \ tr  Sr-*. Gir-.G)^.0 

=  \v,Z-llG2-\y-tT  S-\G. 

Since  G  commutes  with  E  (E  —  OqR),  it  also  commutes  with  E~l .  Introducing  the 
notation 

(4.13)  «;*  =  »'2ru+1>G‘  V,  = 
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so  that  Sq*k  =  t*k ,  we  write  (4.12)  in  matrix  form  as 


(4.14) 


iy*(«-i) 

920  2l20  921  ~  2 l21 

-*(*-!)  ii*(*-ri 

'921  ~  2 l21  922  ~  2l22 


;(0 


i<0 


3 -*('-i)  _ 

2  9l0  l10 


3 -*(>-!)  _  y*(t-i) 
29ll  lll 


Since  S~x  appears  in  the  coefficients  of  the  system  (4.12)  raised  to  different  powers, 
substitution  of  £  =  uoR  will  not  produce  the  cancellation  of  all  powers  of  cr^1 .  Hence,  we 
do  not  write  an  expression  for  p  as  we  did  in  Iterative  Procedure  1,  but  leave  the  iterations 
to  be  carried  out  for  a o  and  <j\  as  indicated  by  (4.14). 


4.2.2.  Procedures  based  on  the  concentrated  likelihood  function 


Iterative  Procedure  3  (Time,  Concentrated  Likelihood  Function,  Scoring) 

We  operate  with  (3.10),  that  is, 

(4.15)  log  n(p)  =  -logjl  +  pG\  -  T log  {y'(I  +  pG)-1y}. 

We  have 

(4.16)  ±  logn(p)  =  -  tv  R-'G  +  TV'R- 

dp  y  R  ly 

_  (tr  RT1  G){y' R~l y)  -  Ty'R~xGR~ly 
y'R~1y 

=  0. 

To  apply  the  scoring  method  we  use  the  fact  that 

(4.17)  I  =  R~XR  =  H-1(J  +  pG)  =  H-1  +  pR~xG , 
so  that 

(4.18)  tr  R~lG  =  tr  R~lG(R~1  +  pRTlG)  =  tr  R~lGR~x  +  p  tr  R~lGR~'G. 
Substitution  in  the  numerator  of  (4.16)  gives  the  estimating  equation 

(4.19)  tr  £--!  GR-\  Gjy'R-2, y)^ 

=  Ty' R:\GR-\y  -  tr  R^GR  ^y'R^y), 
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or,  in  the  notation  introduced  in  (4.6)  and  (4.7), 


(4.20) 


*22  9oO  P  ~  9ll  —  *21  9oO 


Iterative  Procedure  4  (Time,  Concentrated  Likelihood  Function,  Scoring) 


To  derive  Iterative  Procedure  3  we  used  (4.17).  Another  approach  is  to  go  back  to 
the  general  structure  given  by  (3.12).  The  first  derivative  of  log  n(p )  with  respect  to  p  is 
given  by  (4.16),  and  the  second  derivative  is 

d2 

(4.21)  —  log  n(p) 

H-ign^C  r~2  y'R-lGR~1GR-iy(y,R-1y)  +  (y'R-1GR-1y)2 

(y'R'xyf 

tr  R-lGR'lG(y'R~ly)2  -  2Ty' R~lGR~' GR~l  y(y' R~l  y)  +  T(y' R~l  GR~l  y)2 

(y'R-'y)2 


The  method  of  scoring  consists  in  replacing  (4.21)  by  its  expected  value;  instead,  we 
can  replace  each  quadratic  form  in  (4.21)  by  its  expected  value,  using  (4.8).  In  the  notation 
of  (4.6)  and  (4.17)  this  is 


(4.22) 


*22  +  T 


—2(Jot22aoT  +  ((To  til)2 

Wf)2 


Finally,  Iterative  Procedure  4  to  estimate  p  is  given  by 
(4.23)  sSi-1'  {nr”  -  f  =  Tfc»  -  •s2r,) 


Iterative  Procedure  5  (Time,  Concentrated  Likehhood  Function,  Newton-Raphson) 


The  iterative  procedure  is  (3.12).  From  (4.16) 


(4.24)  (y'R-ly)24-  log  n(p)  =  T(y' R-xGR-'y)(y' R~'y)  -  tr  R~*  G(y' IT1  y)2 
dp 

=  Tquqoo  —  tn9oo- 
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d2 


From  (4.21) 

(4.25)  —(y'R~ly)1  log  n(p)  =  2Tq22qoo  -  *22$oo  -  T<ln- 

It  follows  that  the  iterative  procedure  for  p  can  be  written  as 


(4.26) 


I 


-  nr" 


sf*-1) 

Voo 


+ -n; 


-i) 


5oo  J) 


-  T 


1) 

9n 


4.3.  Frequency  domain  considerations 

In  this  section  we  write  all  quadratic  forms  and  traces  appearing  in  the  iterative 
procedures  presented  in  Section  4.2.,  in  terms  of  the  elements  introduced  in  paragraph  III 
of  Section  3. 

From  (3.7)  we  have  I  =  KK'  and  G  =  KDK\  where  D  is  diagonal  with  diagonal 
elements  dj  =  2  cos[7rj/(T  +  1)],  j  =  1,2, . . .  ,T.  Then 

(4.27)  R~  I  +  pG  =  KK'  +  pKDK'  =  A' (I  +  pD)*T', 

so  that  R  is  also  diagonalized  by  the  orthogonal  matrix  K,  and  I  +  pD  has  diagonal 
elements  1  +  2p  cos[irj /(T  +  1)].  It  then  follows  that 

(4.28)  R~9  =  K(I  +  PD)-aK',  G*  =  KD3K\  s  =  0,1 . 


Putting  these  results  together  we  find  that  the  quadratic  forms  introduced  in  (4.6) 
can  be  written  in  terms  of  the  Zj  defined  in  (3.8)  as 

(4.29)  qjk  =  y'R~^1)Gky  =  y'K(I  +  pD)-^+1)K'KDkK'y 

=  (K'y)'(I  +  pD)~u+l)Dk(K'y )  =  z\I  +  PD)~{j+l)Dkz 


-E 


j  =  —1,0,1,...;  A:  =  0,1,... 


+  Pday+'  •* 

while  the  traces  introduced  in  (4.7)  become 

(4.30)  tjk  =  tr  R~jGk  =  tr  K(I  +  pD)~jGkK' 

=  tr  (/  +  pD)~jDkK'K  =  tr  (I  +  PD)~}Dk 
dk 


-E 


j,  fc  =  0,1,...  . 
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For  an  interesting  connection  of  these  results  with  the  analysis  of  variance  see  the 
paper  by  Speed  (1987)  and  the  comments  by  Anderson  (1987). 


5.  Evaluation  of  Quadratic  Forms  in  the  Time  Domain 
5.1.  Introduction 

In  this  section,  we  develop  some  algebraic  procedures  for  the  evaluation  of  quadratic 
forms.  In  Section  9  we  compare  some  of  these  from  the  point  of  view  of  efficient  computa- 


We  first  show  that  all  quadratic  forms  qjk  used  in  Section  4.  where  j  >  k,  can  be 
expressed  as  functions  of  q3 o  =  y/JR_tj+! ' y.  In  effect,  since  R  =  / -fi  pG ,  we  can  substitute 
G  =  p~x(R  —  I)  in  qjk ,  provided  /)  /  0,  to  obtain 

(5.1)  qjk  =  y'R-(J+1)Gky  =  p~ky' R~^+1)(R  -  I)ky 


=  p-ky'R~ (j+1)  jy 

^  P-kY^(-i)k~s(ks)y'R-u'a+l)y 

3  =  0  '  ' 

=  «**  o.  j>t- 


For  example, 


(5.2)  qn  =  -(goo  —  9io)i  921  =  ~(<?io  —  920),  922  =  ~r(qoo  —  %qio  +  92o)- 

P  P  P 

These  relations  can  be  used  to  express  the  iterative  procedures  of  Section  4  as  functions 
of  the  various  traces  and  of  the  qjo.  For  example,  in  Iterative  Procedure  1,  (4.11)  becomes 

(5.3)  { +4r1,]5ir,) -nr'V.r11}?'0 

—  ‘20  9oO  \P  ‘21  +‘20  J9l0  ’ 

while  in  Iterative  Procedure  3,  (4.20)  becomes 


(5.4) 


P  ‘22  900  P  —  1  p  ‘21  900  1  Vio 
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Let  us  now  define 


(5.5)  x  =  R  1y,  v  —  Rlx. 

Then 

(5.6)  q00  =  y'R~ly  =  y'x,  q10  =  y'  H_2y  =  x'x,  q20  =  y1  R~:iy  =  y'v. 
and  we  see  that  it  suffices  to  solve  for  x  in  the  linear  system 

(5.7)  y  =  Rx. 
and  having  done  that  to  solve  for  v  the  linear  system 

(5.8)  *  =  Rv. 

Once  y,  x,  and  v  are  available,  all  quadratic  forms  appearing  in  the  iterative  pro¬ 
cedures  defined  in  Section  4  can  be  easily  expressed  in  terms  of  the  components  of  these 
vectors.  In  effect, 

T  T  T 

<7oo  =  y  ViZj,  9io  =  ^i.2,  920  =  y  XjVj, 

«=i  i=i  1=1 

T- 1 

9n  =  y'R~2Gy  =  y'R~iGR~iy  =  x  Gx  =  2  y  x,x,+i, 

1=1 

T— 1 

(5.11)  921  =  y'R~2Gy  =  x'R~1Gx  =  v  'Gx  =  y  {x,+iv,  +  Xjui+i), 

i=i 

T— 1  T-2 

(5.12)  922  =  y' R~*G2y  =  v'G2x  =  x^i  -l-  xr^r  +  2  y  x*i>,  4-  y  (x,v.+2  4  x1+2u,). 

1=2  1=1 

Hence,  it  folows  that  the  calculation  of  the  quadratic  forms  can  be  reduced  to  the 
calculation  of  the  q}o  for  j  =  0, 1  and  2;  this  in  turn  corresponds  to  solving  explicitly  for 
x  the  systems  (5.7)  and  solving  for  v  the  system  (5.8).  This  will  be  considered  in  the 
remaining  parts  of  this  section. 


(5.9) 
while 

(5.10) 
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Useful  references  for  the  treatmen.  of  linear  systems  in  the  indicated  context  are 
Anderson  (1984),  in  particular  Appendix  A,  Golub  and  Van  Loan  (1983),  and  Graybill 
(1969). 

5.2.  Cholesky  decomposition 

The  Cholesky  decomposition  of  a  matrix  is  a  useful  and  efficient  device,  that  in  our 
case  can  help  to  compute  the  needed  quadratic  forms.  We  consider  the  decomposition  of 

R 


5.2.1.  Derivations 


Since  R  is  symmetric  and  positive  definite  for  —a  <  p  <  a,  whe^e  1/2  <  a  <  1,  in  this 
range  its  unique  Cholesky  decomposition  exists.  It  can  be  written  as 

(5.13)  R  =  TT\ 

where  T  =  (#,•_,)  is  bidiagonal  (because  R  is  tridiagonal),  lower  triangular,  ta  >  0,  =  0 

for  2  <  j  and  i  >  j  +  1.  The  decomposition  can  also  be  written  as 

(5.14)  R  =  UVU', 

where  U  —  ( u,j )  is  bidiagonal,  lower  triangular,  ua  =  1,  u,,  =  0  for  i  <  j  and  i  >  j  +  1, 
and  V  =  (t;,j)  is  diagonal  with  vtl  >  0  (T  =  UV* .) 

Expression  (5.13)  is  often  called  the  Cholesky  decomposition  of  H,  and  T  is  called  the 
Cholesky  triangle.  The  procedure  to  obtain  T  is  sometimes  called  the  square  root  method. 
Setting  VU'  =  S,  say,  we  see  that  S  is  bidiagonal,  upper  triangular  and  that  JR  =  US, 
which  is  a  case  of  the  so-called  LR  decomposistion. 


Proposition  1.  The  components  of  U  and  V  in  (5.14)  satisfy 


(5.15) 


A*  _ n  rp 

.  i  s  l,...,r, 

uj-i 


and 

(5.16) 


P 

Us+1,3  =  -  = 

V»t 


s  =  l,.. 


T-  1, 


whert  the  A,  are  given  in  (3.2). 
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Proof.  Let  R  —  (rtJ).  In  terms  of  components  (5.14)  is 


T  T 


(5.17)  rtJ  --  zz  21  \ 3^-st^ji  —  ^  ^  2J.j3V39U  js  —  Hi iVa'ilji  “f*  ^i,t  —  l^i  — l,i  — 1  — 1  • 


For  j  =  2  we  have 


(5.18) 


1  —  ru  —  Uj,!',,  -)-  U t.i  —  1  ^’t  —  l ,t  —  1  —  i  4"  ^i,i  — 1  — 1  ,i  — 1  ’ 


for  j  =  ?  —  1  we  have 


(5.19) 


P  —  t . t  —  1  —  U ii l-’it U  i  —  1  ,i  4"  U i  ,i  —  1  —  1 ,j  —  1  U i  —  1  ,i  —  1  —  U  j ,i  —  1  V{  _  i  ,j_ i . 


From  this  last  expression  we  deduce  that 


(5.20) 


U...-1  = 


Vi  —  l,t  —  1 


?  =  2, . . . ,  T. 


Using  this  expression  in  (5.18)  we  have 


(5.21) 


va  =  1  - 


P  ~  P 


Direct  evaluation  provides  the  values 


(5.22) 


t>ll  =1,  l>22  =  1  ~  P,  t’33  = 


1  -  2  p2 


I -pi'  U44  1  —  2p2  ’ 


1  -  3p2  +  p4 


in  agreement  with  (3.2)  and  (5.15).  We  then  complete  the  proof  by  induction: 


(5.23) 


a,_  2  P2  Aj_i  —  p2Ai_2  A, 


V "  “  A._, 


A,_i  ’ 


because  the  determinants  of  the  R  matrices  of  various  order  satisfy 


(5.24) 


A t  —  A,_i  p  Aj_2, 


see  Shaman  (1969).  Since  further  U  and  V  are  unique,  the  proof  is  completed. 


Note  that  (5.15)  holds  for  the  Cholesky  decomposistion  of  any  positive  definite  matrix 
R.  As  defined  in  (2.8),  P  is  symmetric  and  positive  definite  for  any  value  of  a.  Hence, 


its  unique  Cholesky  decomposition  exists  and  can  be  written  in  ways  similar  to  (5.13)  and 
(5.14),  namely, 


(5.25) 


P  =  TT  =UVU 


where  the  T,  U .  and  V  matrices  have  the  same  structure  as  for  R.  An  argument  similar 
to  that  in  Proposition  2,  together  with  the  fact  that  in  terms  of  a  the  determinants  of  the 
P  matrices  satisfy  the  relation  As  =  (1  +  q2)As_i  —  q2As_2.  lead  to  the  expressions 


(5.26) 


-  ~ 


l_a2(S+l) 


Aa_i  1  -  <*2s 


s  =  1, - T, 


(5.27) 


a  1  —  a 

tij+1,4  =  —  =  a 
V33 


2  3 


1  _  q2(s  +  1) 


s  =  i,...,r- 1. 


These  are  simpler  expressions  in  a  compared  to  those  involving  the  polynomials  in  p  given 
in  (3.2). 

Proposition  2.  The  components  of  T  in  (5. IS)  satisfy 


(5.28) 


= 


A,_r 


s  =  1  , 


(5.29) 


t»,8  —  1  —  P\ 


fA,-2 

A,_i  ’ 


s  =  2, . . .  ,T 


Proof.  Comparing  (5.13)  and  (5.14)  we  see  that  T  =  TJV J/2,  from  which  (5.28)  and 
(5.29)  follow.  I 

5.2.2.  Using  the  Cholesky  decomposition  to  compute  quadratic  forms 

We  now  use  the  results  of  the  previous  section  to  derive  expressions  for  q} 0,  j  =  0, 1. 2. 


(5.30) 


900  =  y'R-'y  =  i/(UVU')-'v  =  (U-'yYV-'iU-'y) 

T  2  T  A 

,T,~  1  W3  ^*-1  2 

=  WV  »  =  =  L— u,<' 

,  V»8  ,  *-*8 

*=]  1 
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where  w  =  U  1y.  It  suffices  to  find  w  in  the  linear  system  y  =  Uw.  This  system  is 


(5.31) 


( yi  \ 

(  1 

0  . 

0 

0\ 

/"1\ 

V2 

1*21 

1  . 

0 

0 

W2 

\yr / 

V  0 

0  . 

•  UT,T—1 

J 

V  / 

and  hence  it  follows  that  w i  =  y\ , 


(5.32)  ws  =  y3  -  =  ya  -  s  =  2,...,T. 

1 

Let  us  now  define  w*  =  V~lw  =  (u?i/rn, . . . ,  wt/vtt)' ■  Then. 


(5.33)  q10  =  y'R~2y  =  y'  {UVU')~\UVU,)-ly  =  ie'  V^U^U'-1  V^w 

T 

=  w’,lU~1(U')~1w*  =  *'sc  = 


s  =  l 


where 


(5.34)  ( U’)~lw *  =  (L/')-1  V'u;  =  (UY'V-'U-'y  =  {UVU’y'y  =  R~'y  =  x, 

as  defined  in  (5.5).  Hence,  it  suffices  to  find  x  in  the  linear  system  w*  =  U'x.  This  system 


is 


/  ™i/*hi  \ 

/I  u2i  0  ...  0  0  \ 

/  Xl  \ 

W2/V22 

0  1  U32  ...  0  0 

x2 

Wt-i/vt-  1,T-1 
\  wt/vtt  / 

0  0  0  ...  1  ux.r-i 

\0  0  0  ...  0  1  / 

XT- 1 

V  XT  / 

(5.35) 


and  it  provides  the  recursive  relations 
(5.36)  xt  =  ——  —  ~r~wTl 


vtt  Ax 


xt 


WT-a  -  PZT-s+1  Ar-,-1  T 

_t  =  -  =  — - (tUT-s  -  pXT-s+\ ),  s  =  1, . . . ,  2  -  1. 


VT-a,T-a 


A  T- 


Finally, 

(5.37) 


920  =  V'R-3V  =  x'(UVU')-1*  =  (V-'x)'V-'(U~'x) 
T  ,  9  T 


=  ft-v-’k  =  £  —  =  E 

“  A. 


«=1 


»=1 
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where  h  =  U  J*.  It  suffices  to  solve  for  h  the  linear  system  Uh  =  x.  This  system  is 
similar  to  (5.31)  and  hence  it  follows  that  h\  =  xj, 


(5.38) 


ha  —  xs  Ugg—\hg—i  —  xa  p 


As¬ 


s-2 


Aj-l 


‘s-l. 


s  =  2, . . .  ,T. 


We  summarize  these  results  as  follows:  With  observations  yi,t/2, . . .  ,yr  we  compute 


(5.39) 


<7oo 


EAS_1  2  A,-2 

— r — u>a,  m  =  yii  ws  =  y3-  p- - s 

^s-1 


2. 


.  T\ 


(5.40) 

T 

Qio  =  53  x],  XT 

3=1 


At-i 

At 


u’r, 


XT— 3 


Ar-s-i,  , 

— - (u»r-s  -  pxr-s+i ), 

&T-s 


s  =  1, . . . ,  T  —  1; 


(5.41) 


T 

$20  =  53 
s=l 


Aj-i 

A3 


*2, 


L  _  _  _  _As_2l 

rt  j  —  x  j  ^  ft  j  —  x  j  p  ft  $ — j  ^ 

A.,-i 


s  =  2, ....  T. 


5.3.  Successive  elimination 

From  the  preceding  discussion  it  follows  that  we  have  to  solve  certain  linear  systems. 
Let  us  consider  (5.7)  in  detail,  namely,  Rx  =  y :  we  have  to  solve  it  for  x  for  a  given 
vector  of  observations  y,  and  a  given  matrix  R  evaluated  during  an  iterative  procedure. 
The  method  of  successive  elimination  corresponds  to  multiplying  the  system  on  the  left 
by  the  matrix  F  that  is  lower  triangular  with  diagonal  elements  1  so  that  in  the  resulting 
system 

(5.42)  FRx  -  Fy 

FR  is  upper  triangular.  This  upper  triangular  linear  systsem  is  called  the  “forward  solu¬ 
tion”  of  the  method  of  successive  (or  Gaussian)  elimination,  or  pivotal  condensation. 

Anderson  (1971)  gave  this  procedure  in  detail  for  the  case  of  (5.42);  see  also  Anderson 
(1984),  Appendix  A,  Theorem  A. 1.2.  Using  this  approach,  y'R'2y  and  y'R~1GR~1y 
were  evaluated  and,  for  example,  the  final  expression  for  the  former  coincides  with  (5.40). 
This  is  so  because  the  Cholesky  decomposition  is  equivalent  to  the  forward  part  of  the 
method  of  successive  elimination.  We  summarize  these  details  here  for  the  sake  of  com¬ 
pleteness,  and  because  they  provide  a  practical  way  to  calculate  the  elements  introduced 
in  Section  5.2. 
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The  product  FR  corresponds  to  successive  left  products  by  elementary  matrices  F3 
so  that 


(5.43) 


FR  —  Ft -i  •  •  •  F2F\R. 


Fj  adds  to  row  j  4-  1  of  t,h<°  product,  F,-i  ■  ■  •  F2F\R  a  multiple  of  its  row  j  so  that  the 
product  FjFj-i  ■  ■  ■  F2FiR  has  elements  equal  to  0  in  column  j  below  the  j,j-th  element. 

Let  Fa  =  (/-j0),  5  =  -  1,  F  =  Fy  =  w  =  (wt, - wT)'.  Let  r{}3-  be 

the  j,j- th  element  of  the  product  FjFj- \  ■  ■  •  F2FiR.  Then, 


(5.44) 


(5.45) 


rU>  _  rU)  _  ' 

rn  -  ^  rjj  -  (j-i 


(j) 


„(  J  — 1 )  2 

r j-i, j-i  -  P 


0-1) 


j  =  2  ,....T, 


j-i.j-i 

rU)  _  P  •  _  ■]  t  _  1 

*:+i,j  (j)’  3  1. 

r  . 

JJ 


The  elements  *j+1  and  j  =  1, . . .  ,T  —  1,  can  be  computed  in  sequence.  Then 

compute  w  as  follows: 

(5.46)  Wi  =  yi ,  ’  i  =  2, . . . ,  T. 

Thus,  the  elements  of  tn  can  also  be  calculated  in  sequence.  Finally  calculate 


rO) 


(5.47) 


wT 

®  J 

rTT 


r0) 

JJ 


Having  calculated  x  we  compute  goo  and  gio  using  (5.9). 
Comparing  (5.21)  with  (5.44)  we  deduce  that 


(5.48) 


r<;)  _  v - A? 


Ai-i  ’ 


and  in  fact  we  are  calculating  the  diagonal  elements  of  V  in  (5.14)  in  an  (ascending) 
sequence.  Comparing  (5.16)  with  (5.45)  we  deduce  that 

(5-49)  /)?,.,  = 

The  vector  x  as  given  in  (5.47)  is  the  “backward  solution”  of  the  method  of  successive 
elimination  or  pivotal  condensation.  In  effect,  (5.47)  can  be  written  as 

/O  0  ...  0  0  \ 

0  0  fiV  ...  0  0 


(5.50) 


x  =  V~1w  4- 


0 

Vo 


0 

0 


0 

0 


0  fgr-1 

0  0  / 


*, 
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where  w  depends  on  yi,  ■  ■  •  ,yr,  and  this  is  the  form  of  the  backward  solution. 


5.4.  Another  recursive  procedure 


The  linear  system  y  =  Rx  can  also  be  solved  by  repeated  substitution,  as  follows. 
Using  R  =  I  4-  pG  we  have 


(5.51) 


XT- 1 
XT 


px2 

px  J  +  px3 

pXT-2  +  PXT 
PXT- 1 


By  repeated  substitution  we  have 

xi  =yi  -  px 2  =  cnt/i  +  61X2, 

x2  =  y2  -  pxi  -  px3  =V2+  (~p)yi  +  (-p)x 3  +  (-p)2x2 

=  (1  -  p2)-1  {(-p)yi  +V2  +  (~p)x3)  =  C2iy  1  +  C22y2  +  &2x3. 
In  general  we  have 


(5.52) 


Then 


xt  =  ^2ctjyj  +  ktxt+i,  t  =  l, . . .  ,T  —  1. 
>=1 


xt+1  =  yt+i  -  px<  -  pxt+2  =  t/t+i  -  pxt+2  +  (~P)  S  Ct>yj  + 


=  (1  +  pbt)  1  +  yt+i  +  (~p)xt+ 2 1 , 

and  the  recursion  for  the  coefficients  is 

c<+i.i  =  —  i+p6,  c<i»  j  = 

(5.53)  =  1^7,  j  =  *  + 

^  =  “  i+pfc,  ’ 

where  these  expressions  hold  for  t  ■=  1  —  1,  and  we  either  define  bx  =  0  or  take 

xt- t-i  =  0.  The  resulting  system  is 

/  xi  \  /  cn  0  ...  0  0  \ 

I  xo  1  I  C21  c22  ...  0  0  1 


(5.54) 


XT- 1 

xr 


c2l  c22  ...  u  u 

CT— 1,1  CT-1,2  ■■■  CT-l,r-l  0 

CT\  CT2  ■  ■  ■  CT,T- 1  CTT 


where  cu  =  1,  61  =  —  p.  Denoting  C  =  (ctJ),  (5.54)  can  be  written  as 


(5.55) 


*  =  Cy  + 


/° 

61 

0 

0 

\ 

•  0 

0 

b2 

0 

•  •  0 

0 

0 

■  •  ^T— 2 

Vo 

0 

0 

0 

/ 

L  P  At-1  At)  ,  -  rp  - 

°t  = - =  -P— r~  =  Jt+\,t  =  -«f+i,t,  *  =  1, . . . ,  T  -  1, 

vtt  At 


which  compares  with  (5.50).  In  fact, 

(5.56) 
while 

(5.57) 


ctJ  =  — - -  =  {-p)  - 


J  =  1 ,...,*• 


The  ctj  can  be  obtained  from  V~1w  in  (5.50)  by  repeated  substitutions. 


6.  Evaluation  of  Traces  in  the  Time  Domain 

6.1.  Introduction 

In  Section  6.2  we  will  evaluate  *io  and  <20  by  means  of  series  expansions;  we  now  show 
how  all  traces  tjk  used  in  Section  4,  where  j  >  k,  can  be  expressed  in  terms  of  the  tj0-  In 
effect,  using  again  that  G  =  p~1(R  —  I)  we  have 

(6.1)  tjk  =  tr  R~]Gk  =  p~k  tr  R~j(R  -  I)k 

=  p~k  tr  ]T(-1)*-S  (k)Ra  =  P~k  ^-l)*'8  (J)  tr 
«=o  ' 5 '  »=o 

=  p-‘E(-ir  (!)<,-., 0,  )>t. 

»=o  '  ' 

For  example 

(6.2)  tn  =  ~{T  —  tio),  *21  =  -(*10  —  *20),  *22  =  -x —  2*io  +  *20), 

p  P  P 
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These  relations  can  be  used  to  express  the  iterative  procedures  of  Section  4  as  functions 
of  the  various  quadratic  forms  and  of  the  tjo  ■  For  example,  in  Iterative  Procedure  1,  (4.11) 
becomes  in  terms  of  qjo  and  tjo  only, 


(0.3)  |  |t20  q00  —  r10  q1Q 


+  Tq 


^i-1)  _ 


10 


10 


sir1’}?10 


Sti-lWi-1)  T(.-lMt-l) 
l20  900  ~  r10  9l0 


and  in  Iterative  Procedure  3,  equation  (4.20)  becomes 


(6.4)  { 


rp  I —  1 )  1  —  1 ) 

1  ~  T10  l20 


-(i-1)  ^(i  — 1)^1  — 


9oo 


- < 


10 


10  jP 


=  {[' 


9  oo 


< 


10 


yp-J) 

l20 


In  Section  6.3  we  shall  develop  a  procedure  to  evaluate  t\\  and  t^2  in  the  form  of  some 
rational  expressions;  we  now  show  how  to  express  all  other  needed  traces  as  functions  of 
these  two. 

From  (6.2)  we  deduce  that 


(6.5)  tio  =  T  —  ptu,  <20  —  T  —  2ptn  +  p2ti2, 

and  hence  that 


(6.6)  <21  =  <11  —  P< 22- 

These  relations  can  be  used  to  express  the  iterative  procedures  of  Section  4.  For 
example,  expression  (5.3)  for  Iterative  Procedure  1  becomes 


(6.7)  { [?,;-» - [&-» - &-’]  +?<-,ftr,>5<.r,,};5('> 
=-{T-2?<'-”ft-”+[^‘-”]V2r”}[sir”-^rI)]-?<'-1)[n'r1)-?(i-,,4r,>]5<171>. 

while  expression  (5.4)  for  Iterative  Procedure  3  becomes 

(6.8)  ^-”ft-”,1r1Vfl  =  {r-^-”ft-I|+[^-”]2ft-”}5<„rI,-r,ft-,). 


6.2.  Series  expressions 
For  \p\  <  1/2  we  have 

oo  oo 

(6.9)  <10  =  tr  K-1  =tr(J  +  pG)-1  =  tr^(-p)JG>  =  ^  p2*  tr  G2*, 

j= 0  k= 0 


29 


since  tr  G 3  =  0  for  j  odd.  Note  that  (6.9)  converges  because  the  characteristic  roots  of  G 
axe  less  than  2  in  absolute  value  and  \p\<\.  Similarly, 

OO  OO 

(6.10)  t20  =tr  R  2  =  tr  (I  1-pG)"2  =  tr  =  ^(2fc  +  l)p2k  tr  G2k . 

j=  1  (c=0 

It  is  shown  in  Section  7.3  that 


VA  H  g=l  k=g(T+i )  v  3  J/ 


6.3.  Rational  expressions 


To  compute  t\\  and  *22  we  consider  an  expression  for  \R\  and  use  that 


(6.12) 


j-log|H|  =  tr  R~'G  = 


(6.13) 


log  |A|  =  -  tr  R-'GR  'G  =  -<M. 


Anderson  (1971),  Lemma  6.7.9,  shows  that 


(6.14)  \I-6G\  =  (l-402)-*Q^  j  (l  +  VT^)T+1  -  (1  _  vTT^?)T+1  j  . 

Hence,  we  identify  6  =  —p  and  use  this  result  directly.  Let  us  denote  a  —  (1  —  4p2) 2 ,  so 
that  da/ dp  =  —4 p/a.  Then 


(6.15) 


|fi|  =  1/  +  pG |  =  1  (i)T+1  {(1  +  a)T+1  -  (1  -  a)™ }  , 


(6.16) 


log  |H|  =  -logo  -  (T  +  l)log2  +  log  {(1  +  a)r+1  -  (1  ~  a)r+1 }  , 


(6.17)  —  log  |/2|  = 


4P  (r  +  l)^{-(Haf-(l-af} 

a2  (1  +  a)T+1  -  (1  -  a)T+1 


4p  4(T+l)p  (1  +  a)T  +  (l  -a)T 

d 2  a  (1  +  a)T+1  —  (1  —  a)T+1 
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(6.18) 


cP  4a2  +  32p2  4(T  +  l)(a2  +  4p2)  (1  +  a)T  +  (1  -  a)T 

dp2  loS  lKl  ~  a4  a3  (i  +  a)T+i  _  (i  _  a)T+i 


4 


16(T  +  l)p 


a* 


TUl+a)7--1  -(l-a)T-*}{(l+«)T+1-(l-a)T+1} 

{(1  +a)T*'  -  (1  -a)T+!}2 

(T +!){(! +  a)r  + (1  —  o)7-}21 
{(1  +  a)T+1  —  ( 1  —  a)r+1  }2 


Simplifying  slightly  this  last  expression  we  have 
10.  i2,  _  4  +  16p2  4(T  +!){(!  +  a)r  +  (I  —  a)r} 

(  '  9)  d?  °S|K|  a*  a3{(i+0)T+i  _(i_a)r+i} 


16(r  +  l)p2  (1  +  a)2T  +  (1  -  a)2r  +  (4 T  +  8p2)(4p2) 
a2  |(l  +  a)T+i  _(1_a)T+i}2 


2  \T— 1 


6.4.  Using  the  solutions  of  linear  systems 


The  calculations  in  Section  5.3  were  presented  as  part  of  the  computations  needed  for 
quadratic  forms,  but  can  also  be  used  for  traces. 

From  (5.14)  we  have  that  R  =  UVU',  so  that  the  “forward  solution”  (5.42)  FRx  = 
Fy  =  w  is  VU'x  —  U~1y  =  w,  and  simultaneously  U'x  =  V~1U~ly  =  V_1iu.  Note 
that  F=U~1. 

To  compute  <io  =  fcr  R~1  we  set  RX  =  I,  where  X  and  I  are  of  orders  T  x  T.  In 
successive  elimination  the  forward  solution  is  FRX  =  F  or  VU'X  =  I/-1.  We  get  U~1 
and  (diagonal)  V  by  recording  the  steps  of  the  forward  solution  of  Rx  =  y.  Then 

(6.20)  tio  =  tr  R~l  =  tr  X  =  tr  (l/')"1  V"1!/"1 


=  tr  F'V-1F  =  tr  V~lFF' 


=  >+£ 


i+/?.+/,2 


s2 


+ - t" 


where  F  =  (/tJ).  From  Section  5.3  we  have 


(6.21)  F  =  Ft-\Ft-2  —  F2F1, 

and  that  the  F,  matrices,  for  s  =  1, . . . ,  T  —  1,  are  lower  triangular,  bidiagonal,  and  have 
elements 


*  =  h 


i  =  s  +  1,  j  =  5, 


=  0, 


otherwise. 


(6.22) 
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Using  these  results  we  obtain 


(6.23)  fa -ftjtiit, J  =  1,...,T-1;  i=j  +  l,. 

In  more  detail,  the  elements  of  F  below  its  main  diagonal  are 

(6.24) 

hi  =4!’, 

f  _  r( 2)  r(l)  r  _  r(2) 

/ 31  —  /32  7 21  i  732  —  7 32 

f  _  f(3)  r<2)  r(l)  r  _  r(3)  f(2)  ,  __  ,(3) 

7 41  —  J43  J32  J21  i  74 2  —  J43  732  1  743  —  J43  1 


t  _  f(T-l)  AT-2)  Ml)  r  __  Mi-1)  r\l~i)  r\i)  r _ _  Mi-1) 

7T1  —  Jt,T-\JT-1,T—2  ' ' '  721  1  7T2  —  7t,T-i7t-1,T-2  '  '  732  ’  *  •  •  1  JT,T-l  —  7t,T-i 
We  next  compute  <20- 


rU)  r  _  r(T-i)  ,(T— 2) 


_  x(T-l) 


(6.25)  t20  =  tr  R~2  =  tr(F' V-1  F){F'V~1  F)  =  tr  FF'r’FF'r1 


r  r 

=  tr(V-*FF'V-*)(V-5FF,'U~'5)'  =  tr  HFT  = 

»=i  >=1 

where  H  =  V~^FF'V~  ' 2  is  symmetric,  and  we  have  used  the  circular  property  of  the 
trace.  The  components  of  F F'  are 


(6.26) 


T  min(i,j) 

E  *•/»•  =  E  *,i  =  i . T, 


so  that  the  components  of  FF  are 


(6.27) 


hij  — 


iin(x,j)  r  x  min(«,j)  ,  , 

_  7>i=  1  JtaJjs  \  ^  J  xa  J  J  a 


' vii  \/v)i 


=  E 


a=1  \Ab7 


and  hence 


(6.28) 


T  T  fmmCtj) 


«M  =  EE  E  -£=-%= 


,=  I  J=1  3=1 


i  r  «  r2 


t  r  r  « 


=  y  va  +2VV 


x=l  «»  La=l 


E/..7 


«=1  1=1 
«<) 


"22  ~ 
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These  computations  for  a  given  value  of  p  can  be  added  to  those  presented  in  Section 
5.3  to  compute  q0o  and  qio-  We  now  present  the  computations  needed  for  q0o,  <?io,  920*  Do- 
and  i2o,  followed  by  some  comments  to  facilitate  the  interpretation.  Define  #  =  FF' 

Starting  values  (s  —  1) 


(6.29) 

r(1) 

rn 

(6.30) 

fj] 

(6.31) 

/(1) 

£10 

(6.32) 

<(1) 

r20 

(6.33) 

Step  s<  s  —  2 . T 


(6.34) 

(6.35) 

(6.36) 

(6.37) 

(6.38) 

(6.39) 

(6.40) 


r(s-1)  -  o 2 

(a)  _  ra-l,a-l  P 


T* v  7  ~ — 
38 


3  — 1,1-1 


r(a-l)  _  P 
Js,a-\  (s-l)  ’ 

faj  =  j  =  1,  .  .  •  ,  S  —  1, 


<t>aj  —  'y  ]  fakfjki  j  !,•••, 


•S, 


Jt=l 


(*)  _  (3-1)  $33_ 

*10  —  l10  ~r  ( g )  ’ 

F33 


ts 


_,<*-»>  .  A 

—  l70  <  l 


20  ““  *20 


[rS^]2 


+  2 


J'U— » 

rB3  ,J-1 


+  •••  + 


(«)  (1) 


.  r(a-l) 

Wt  =  y»  +  ^a-1- 
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After  completing  these  computations  we  have 


(6.41) 

c-4- 
k— * 

O 

II 

(— >  ' — ' 
°i 

t 

* 20  —  ’ 

(6.42) 

* 

ll 

ti 

H 

„  _  WJ  ,  Aj) 

XJ  ~  U)  +  /j+i.ix2+1  ’ 
rjj 

J  =  T~  1,.. 

(6.43) 

T 

qoo  =  y  y,r, 

1=1 

910  =  yx2 

,=i 

Finally,  to  compute  920 

we  have 

(6.44) 

w\  =  ri , 

1  .  r(3~  J'  1 

Ws  =  **  +/,,*-! 

s  =  2 . T, 

(6.45) 

rj 

II 

i  - 

S'";  S 

U>'  / ,, 

~  _U) 
jj 

(6.46) 

T 

920  =  y  ZjVj 
i=l 

• 

Note  the  following  points. 

1.  Formulas  (6.34),  (6.35),  (6.40),  and  (6.42)  were  already  given  in  Section  5.3  to 
compute  900  and  910  in  (6.43). 

2.  Similar  computations  in  (6.44)  and  (6.45)  produce  920  in  (6.46).  This  was  dis¬ 
cussed  in  (5.5)  -  (5.9). 

3.  Hence,  the  superscripts  in  and  a-nd  the  indices  in  wa  correspond  to 

the  calculations  being  done  in  sequence. 

4.  To  calculate  *io  and  <20  we  need  to  compute  the  components  of  F  and  *  =  FF' . 
One  row  of  F  is  computed  at  each  step,  namely,  for  row  s,  the  elements  f3}  for 
j  =  1 , . . . ,  s  —  1 .  Further  ft9  =  1  and  ftj  =  0  for  s  <  j.  The  calculations  in  (6.36) 
correspond  to  the  structure  of  fc  given  in  (6.23)  or  (6.24). 

5.  The  components  of  #  =  FF'  ere  scalar  products  of  the  rows  of  F,  and  are 
calculated  in  (6.37),  where  the  sums  can  also  reach  k  =  T  in  each  case,  because 
for  k  >  j  ( j  <  s)  at  least  one  of  the  factors  fjk  is  0. 

6.  In  t\*Q  and  the  superscripts  denote  partial  sums,  so  that  =  <10  and 
t ^  =  *20,  which  is  (6.41). 
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7.  Evaluation  of  Quadratic  Forms  and  Traces  in 
the  Frequency  Domain 


7.1.  Calculation  of  Fourier  coefficients 

The  Fourier  transformation  of  the  operations  (3.8)  is  different  from  the  usual  trans¬ 
formation 


<2  v  -  2  ttj  k 

cos—f~' 

fc=i 


j  =  o,i .  —  , 


2  r —  2  irjk 

rl^Vk  sm  ~t~ ' 


j  =  l,..., 


T  -  1 


Since  the  transformation  (3.8)  diagonalizes  G  and  the  transformation  (7-1)  and  (7.2)  does 
not  diagonalize  G,  the  former  yields  simpler  results,  as  indicated  in  Section  4.3. 

For  large  T,  the  fast  Fourier  transform  can  be  used  for  efficient  computation  of  (3.8). 
We  write  for  j  =  1 , ,T 


r  2 

T- 1-1 

!  2 
T+  1 


TTjk 

ykSmrT~i 


TTjk 

yk  sinrT7 


for  arbitrary  yr+i  since  sin  [7 xjkJ(T  +1)]  =  0  for  k  —  T  +  1.  Further  we  have 


■2(T+I) 


UVi  T.  »*sin 


2(T+1) 


2-rrj  k 

2  (T+  1)’ 


where  y *  =  —y2(T+i)-k,  k  =  T  +  2, . . . ,  2T  +  1,  and  y'2T+2  is  arbitrary.  Then  (7.4)  has  the 
usual  form  of  the  sine- transform  for  2(T  -f  1)  observations  and  the  usual  computations  for 
the  fast  Fourier  transform  are  available. 

7.2.  Evaluation  of  quadratic  forms 

"We  want  to  calculate 


=  w  =  j  =  0,1.2. 
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=  V  I  Z>  ,  ZT+1-S 

U1  +  pday  (i  -pday_ 

__  (1  —  pds)3zj  +  (1  +  pda)3  Zy+1_a 

(1-P2^)3 

__  ^  (1  +  3 p2(Pa)(z23  +  Z^+1_S)  -  (3 pda  +  P3dp(zl  -  4-+i-s) 

tri  (i  -P2dlf 


while  if  T  is  odd. 


2  ,  V''/  (1  +  3p2d7s)(z1  +  Zy+1_s)  —  (3pds  +  p3dl)(z 

(<.ll)  q20  =Z(T+i)/2+  ^  - —^2j3 - 


The  series  form  can  be  used  to  obtain 


(7.12) 


r(j  +  k  +  i) .  ^ks-'jk 


E1U+  K  -t  l 

«r0‘  +  D 


7.3.  Evaluation  of  traces 


Since  the  characteristic  roots  of  G  are 


(7.13) 


da  —  2  cos 


T  +  l 


t  *  $  _ '  II 

=  e  r+r  -(-  e 


the  characteristic  roots  of  G2k  are 


(7.14) 


d2tk  =  (e'tt+e-'ttyK 

~E(2^)eWe”i*h 

>=0  '  J  ' 

n  \  J  / 


(2  fc-j) 


(7.15) 


;  =  0  '  J  '  a=  1 
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<®  to 


Since 


(7.16) 

we  have 

(7.17) 

Then 

(7.18) 


Y  e'2^1*  =  T  +  1  if  (fc  -  j)  =  0,  ±(T  +  1),  ±2(T  +  1), . . 

3=0 

=  0  otherwise, 


=  J  ifj  s  Jfc,fc±(T  +  l),Jt±2(T  +  l),... 

=  —  1  otherwise. 


5=:1 


tr  °ik  =  iz  ((f)  (-i)  +  (r  + !)  f 


j=0 


>=o 

J=*,k±(T+l),.. 


The  first  term  in  tr  <jr2fc  is  —1  times 
(7.19) 

The  second  term  is  T  +  1  times 

k  )  ’ 


2* 


fc  =  r, 


(7.20) 


2k 

k 


l 


+  *«  *+"+!))•  *“r+l--ir  +  1- 


Then 


(7.21)  =  £{(“)  (T +  1) -2”}  ,>■ 

oo 


2  fc 


d-^r  +  i)  +  1)) p2*  +  2(t+ i)  Y1  ( fc  —  2(T  + 1) 


Jt=T+l  N  '  '  "  *=2(r+l) 

The  first  term  in  the  first  sum  in  (7.21)  is  T  +  1  times 

V'  (2^)-!  .2k  _  V"'  T(2l-  +  1)  2t 
fc=ofc!r^  +  1) 


(7.22) 


=  E 


“  m+  ijr(i:  +  i)2I  2t 


>2k 


k=D 


k\T{k  +  l)^ 
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We  have  used  n!  =  T(n  +  1)  and  the  duplication  formula  for  the  gamma  function 


(7.23) 


T(2f3  +  1)  = 


r(^  +  i)r(/3  +  D22^ 


The  first  sum  in  t\ o  is 


(7.24) 


T  +  1  1 

y/l  -  4 p2  1-4 p2' 


which  is  a  good  approximation  to  <io  because  the  neglected  terms  are  0(p2(r+1*).  Thus 
we  obtain 


(7.25) 


<io  = 


T  +  l 


y/l  -  4 P2  1  -  4 P2 


+  2(T  +  1)^  53 

s=l  k=g(T+l) 


2k 

k-g{T+  1) 


„2  Jt 


The  sum  on  A:  in  (7.25)  can  be  related  to  the  hypergeometric  function  as  follows:  for 
each  fixed  g  —  1,2,  •  •  •, 


OO  / 


(7-26)  E  u_ 

k=*(T+l) 


2k 


\k-g(T+l) 


2  k 


=  £ 


(2*1! 


.2* 


*=»(T+1) 


[*  -  s(T  +  !)]![*  +  J(r +!)]!' 


y'[2(fe+g(r+l))]! 

2h+2g(T+l) 

fah'.[h  +  2g(T  +  l)]\P 


2fl(T+i)  Y2'  r[2 h  +  2g(T  +  1)  -f  1]  2h 
9  fan hW[h  +  2g(T  +  l)  +  l }P 

n=u 

p2g(T-f i)22g(T-t-i)  «  r[fc  +  g(T  +  1)  +  l/2]T[h  4-  g(T ±  1)  +  1]  2ft 
fa  h\T[h  +  2g{T  +  l)  +  l]  K  P) 

(2 p)*9(T+»  r [g(T  + 1)  +  i/2]r[ff(r  + 1)  + 1] 

\/7r  T[2g(T  +  1)  +  1] 

F[<7(T  +  1)  +  1/2, g(T  +  1)  +  l;2p(T  +  1)  +  l;4p2], 


where  we  used  the  definition  of  the  hypergeometric  function 


(7.27) 


OO 

F(a,  6;  c;  x)  = 

7=0 


T(a  -f  j)  r(b  +  j)  r(c)  x' 

T(a)  T(b)  T(c  +  j)j\- 
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We  conclude  that 


(7.28) 


*10 


T+_ 1 

7^? 


i 

i  -  ip* 


+  2(T  +  1)£ 

3=1 


(2p)23(T+1} 


r[g(T  +l)  +  l/2]r[ff(T  +  1)  +  1] 
r[2ff(r  +  i)  +  i] 

F(g(T  +  1)  +  1/2,  +  1)  +  1;2  g(T  +  1)  +  l;4p2]. 


(7.29) 


The  argument  in  Section  7.2  can  also  be  used  for  traces.  In  effect, 

T  i  t/2 

—  y-' - —  —  2  - — — ,  for  T  even  , 

^1  +  pda  \-p2dV 


t 


10 


3=1 


S=1 


(T-D/2 

=  1  +  2  E  forTodd- 


5=1 


(7.30) 


*20  =  ]T 


T/2 


=  2EnE 


1  +  p2^ 


^(1  +  p^)2  ^(1-p2^2)2’ 


for  T  even , 


(T"1)/2  1  4. 

=  1  +  2  E  (TT^  ’forTodd' 


The  expression  in  (7.24)  is  an  approximation  to  *io.  This  value  can  be  obtained  by 
approximating  the  sum  defining  *io  by  a  corresponding  integral,  as  was  done  in  Anderson 
(1971).  Besides  (7.24)  this  procedure  provides  the  following  approximations: 


(7.31) 


(T+  1) 


l  -  yT  -  l  p2 
y/l  -  4  p2 


4  P2  \ 

1  ~  4p2  /  ’ 


(7.32) 


*20  ~ 


(T  +  l)y/l  -4p2  -(1-f  4p2) 
(1  -4p2)2 


(7.33) 


*21 


— 4(T  4-  \)py/\  —  4p2  +  8p 
(1-4  p2)2 


(7.34) 


*22 


T  + 


-8  p2  )  4  +  16p2 

4p2)3/2  J  “  (1  -4p2)2' 
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These  five  approximations  also  satisfy  relations  similar  to  those  derived  in  Section  6.1 
among  the  needed  traces,  and  in  particular  they  satisfy  equations  similar  to  (6.2)  and 
(6.5). 


8.  Other  Exact  Maximum  Likelihood  Procedures 

In  this  section  we  collect  some  approaches  related  to  the  procedures  developed  in 
previous  sections. 

8.1.  Calculation  of  quadratic  forms  containing  the  matrix  P. 

We  consider  an  approach  that  permits  the  calculation  of  quadratic  forms  of  the  type 
y,P~Jy.  We  illustrate  the  ideas  with  the  cases  j  =  1  and  2.  These  quadratic  forms  can 
be  used  to  implement  iterative  procedures  in  terms  of  a  or  to  compute  quadratic  forms  in 
an  iterative  procedure  for  p  by  using  (2.17). 


/0  0  ...  0  0\  /q\ 

[  1  0  ...  0  0  1  /  0 

L=  0  1  •••  0  0  ,a=  0  ,  B  =  I  +  aL,  Q  =  BB' 


0  0  ...  1  0 


We  see  that  B  is  nonsingular, 

(8.2) 


P  —  Q  +  aa', 


(S’3)  P~l  =  Q'1  ~  7  -,0~r-  <rwcr\ 

1  +  a  Q  a 

This  is  a  simple  case  of  a  general  formula  called  Woodbury’s  formula  by  some  authors;  see, 
for  example,  Phadke  and  Kedem  (1978)  and  Press  (1982). 

Calculation  of  y  P~ly. 


»— i _ 'o-l 


(8.4)  y'P  1y  =  y'Q  'l/-— — f— r -y'Q~laa'Q  ly 

l  +  a  Q  a 


=  V'B'-'B-'V  -  *  B  °Q  *  *  * 

z'kk'z  _  ,  (z’fc)2 

=  zz~TTkrk~zz~TTkrk' 
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where 
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(8.19)  n  —  B'  1Jb,  k  =  B'n. 

These  systems  can  be  analyzed  in  the  same  way  as  (8.7)  and  (8.10)  to  provide  the 
following  recursive  procedures: 

(8.20)  rriT  =  ZT,  mj  =  Zj  —  arrij+i,  j  —  T— 


(8.21)  nr  =  fcr,  rij  =  kj  -  arij+ i,  j  =  T  —  1, . . . ,  1. 

The  rij  axe  given  explicitly  by 

I  _  a2(T-j+l) 

(8.22)  rij  =  -{-at)3 - j  __  —g - ,  j  =  l,...,T. 

We  proceed  as  follows:  We  have  z\,...,zt  available  from  the  calculation  of  y'P_1y, 
and  also  S2r  from  (8.15).  We  then  start  with 

(8.23)  mj  =  zt,  nr  =  -(-a)r,  S3T  =  Zt,  S*t  =  a2T,  Sst  =  —{—q)Tzt- 
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We  then  compute  in  succession 


(8.24)  m.j  =  Zj  -  amj+1,  rij  = -(-a)j  -  anj+1,  j  =  T~ 
and 


(8.25) 


S$j  — -  ,  S^j  —  *^4 ,j — i  4~  — l  4”  TTijTij , 

j=T- !,...,!  . 


Then 

(8.26) 


y'P  2y  =  S31  4- 


1  —  o2  V 


1  -  o2r+2 


*^2T*^41  —  2 


1  —  cr 


1  -  o23>2 


S2TS5i 


8.2.  Estimation  using  the  EM  algorithm. 


The  analysis  in  the  preceding  section  can  be  related  to  the  EM  algorithm  for  computing 
maximum  likelihood  estimates,  as  described  for  example  in  Dempster,  Laird  and  Rubin 
(1977). 

The  generating  equations  for  yi, . . . , yr  coming  from  the  MA(1)  model  (1.1)  can  be 
written  as 


fyl\  , 

/  U !  4  aiio  \ 

( 1  0  0  ...  0  0\ 

(Ul\ 

/a\ 

V2  I 

u2  4-  a«i  j 

..  0 

, . .  ►_ l 

•  •  o 

.  .  o 

•  •  •  o 

4"  uq 

0 

\yr  / 

\ut  +  auT-i  / 

\0  0  0  ...  a  1 } 

Vttr/ 

UJ 

(8.27) 


In  terms  of  the  notation  used  in  (8.1)  we  write  this  as 

(8.28)  y  =  (I  +  aL)u  +  u0a  =  Bu  +  uoa, 


which  in  turn  can  be  written  as  the  transformation 


(8.29) 


(:)-(:£)(:) 


We  take  (uo,u')'  as  N(0,ct2It+\)-  The  transformation  (8.29)  has  Jacobian  equal  to  1,  and 
hence  ( uo,y ')'  is  normal  with  expectation  0  and  covariance  matrix 


(8,3°)  *2(a  b)(o  aa'  +  BB')  “^(a  p)  ' 
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The  determinant  of  this  covariance  matrix  is  1. 

To  use  the  EM  algorithm  we  augment  the  observations  yi , . . .  ,  yr  by  the  unobserved 
uq  and  consider  it  as  a  “missing  observation.”  The  EM  algorithm  is  an  iterative  procedure. 
Given  preliminary  values  of  the  parameters  a  and  a2  we  obtain  an  estimate  of  uq  say  u g1’ 
as  the  conditional  expectation  of  uo  given  y  and  preliminary  values  of  a  and  a2 .  Next 
we  obtain  maximum  likelihood  estimates  of  o  and  a2  on  the  basis  of  (u^^y').  Because  a 
appears  only  in  the  exponent  of  the  normal  distribution  of  ( ixo  ^  .  this  step  amounts  to 

minimizing  the  quadratic  form  in  the  exponent  of  the  normal  distribution  of  (uo,y')  and1- 
then  maximizing  the  resulting  concentrated  likelihood  with  respect  to  a2 .  However,  since 
the  value  of  a2  is  irrelevant  to  maximizing  the  likelihood  with  respect  to  a,  one  can  carry 
out  the  iteration  with  respect  to  a  and  after  its  completion  find  the  estimate  of  a2 . 

To  study  the  joint  density  /(uo,y)  we  use 

(8-31)  f(u0,y)  =  g(u0\y)h(y). 


From  the  covariance  matrix  (8.30)  we  find 


(8.32)  £(U„|y)  =  a'p-'y  =  a'  (V1  -  - ■-  ) -Q~'aa'Q-')  y  =  ■ 

\  1  +  aQ  a  /  1  +  aQ  la 


(8.33) 


Var 


(«o|y)  =  1  —  a  P  1  a  —  \  —  a  (q  1  -  ,  -  j- Q  laaQ  A  a  =  : -  —  ]  , 

V  1  +  aQ  a  J  1  +  aQ  a 


while  £y  =  0,  Var(y)  =  P. 

Hence,  the  exponent  in  the  joint  density  of  (txo,  y')  is  —  ^  times 


(8.34) 


y’P  Jy  +  (1  +  a'Q  la)  (u0  -  —  }  a'Q  ly 

\  14 -  aQ  a 


,J( 0  0'  \  fl  +  a'Q-'a  - a'Q~ 1 

P-Iy  +  (V  —Q~1a  (1  +a’Q-1a)-1Q-1aa'Q-1 


We  now  apply  the  EM  algorithm. 


uo 

y 


E-step.  For  a  given  value  of  a  calculate 
(8.35) 


w  ,  ,  a'Q  }y 
u„  =  £(u0|y)=  1+a,Q-la- 
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M-step.  Minimize  with  respect  to  a  the  quadratic  form 


(8.36) 


(«o  ,y') 


1  0'  \  / 1  a' 

a  B  { 0  B' 


-l 


u0 

y 


,,  I  1  +  a'Q  1(2  ~a'Q  1  \  (  u0 
=  (U0’9)(  -Q-a  Q  -  JU 


=  Uq(\  +  a'Q  1a)-2u0a'Q  1y  +  y'Q~ly 


With  the  minimizing  value  of  a  repeat  the  E  and  M  steps. 

Since  Q  =  BB',  z  =  B~ly ,  k  =  jB-1  a  a~  used  in  Section  8.1,  we  have  that  in  this 
notation  (8.35)  is 


(8.37) 

and  (8.36)  is 


q'B'-1  B~ly  =  z'k 
1 +  a/J3/_1B~1a  l  +  k'k' 


(8.38) 


u0(l  +  k'k )  —  2u0z'k  +  z'z. 


If  in  (8.38)  we  substitute  for  u0  expression  (8.37),  we  obtain 


(8.39) 


(z'k)2  o  (z'k)2  ,  ,  (z'k)2 

- — rrr  -  2- — ttv  +  *  *  =  *  *  -  1 — rrr 

1  k  k  1  +  fcfc  1  +  fcfc 


y'P~ly 


in  view  of  (8.4).  Hence,  we  are  minimizing  y'P  with  respect  to  a,  but  doing  the 
iterations  via  the  EM  algorithm. 


8.3.  Use  of  the  explicit  components  of  the  inverse  covariance  matrix. 

As  indicated  at  the  beginning  of  Section  3,  the  likelihood  function  can  be  written  as  a 
function  of  a  and  a2  in  terms  of  the  determinant  (3.1)  and  the  components  (3.4)  of  P~x . 
In  effect, 


’) 


(8.40)  v'p-'y  =  (1_--,)(11-  V(f-4i)){E^(1  -  “J,)d  -  “J(r"+1>: 

+2  12S^*+t(“0[)t(l"a2')(l“a2(T  * 

»-i  t=  i  ' 

Godolphin  and  de  Gooijer  (1982)  derived  from  the  likelihood  function,  expressed  in 
these  terms,  an  iterative  procedure  for  a. 
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8.4.  Relation  to  optimal  prediction. 


The  likelihood  function,  for  example  (2.11),  can  be  written  in  another  equivalent  form 
by  using  (5.25),  (5.26),  and  (5.27).  In  effect, 


(S.41) 


ipi = \uvu\ = ivi = n «... 


(8.42) 


y'p-ly  =  {U  \)'V  \U  ly)  =  w'V  'w  =  £  ^ 


T  -2 
— '  U'  Z 


where  in  analogy  to  (5.32)  we  define 


(8.43) 


ws  =  Vs  -  Ua,s-lWa-i  =  ya  -  Ot- - Wg-!. 

va- l,i-l 


Hence,  (2.11)  becomes 


(8.44) 


X*(a,<72)  =  (2tt)  t/2(<x2)  t/2  JJti 


exp{-£*£f£}’ 


with  vaa  =  A,/As_i  defined  in  (5.26)  [A,  =  (1  —  a2(-*+1))/(l  —  a2)  from  (3.1)],  and  wa 
defined  in  (8.43). 

This  expression  can  be  related  to  minimum  mean  square  prediction  (and  hence  to 
Kalman  filtering),  as  several  writers  have  recently  emphasized.  In  effect,  we  can  prove 
that  in  our  case. 


(8.45) 


=  ya-  £(y*|y.-i,...,yi),  a  =  2,...,T, 


with  w i  =  0,  is  the  error  of  the  optimal  prediction  of  ya  based  on  y,_i , . . . ,  yi ,  and 
(8.46)  o2vaa  =  Vai(wg)  -  £  j  \ya  -  £{ya\ya-U  . . . ,  yx  )]2 |y,_i , . . . ,  yi  }  . 

Harvey  (1981),  while  considering  the  Kalman  filtering  approach  to  the  problem  of 
estimation  in  the  MA(1)  model,  gave  (8.44),  (8.45),  and  (8.46),  and  wrote  the  recursions 
(in  our  notation)  as 


(8.47) 


vaa  =  1  + 


Q2a  _l-a2(,+  1) 

1  +  a2  +  ■  •  •  +  a2(*-H  1  —  <>2j  ’ 
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,  .  u"a_!  1  —  a2's-1/ 

(8.48)  wa  =  y„  -a- -  =  y3  +(-a)— - 7—  U's-i- 

Vi-l.j-i  l-Q^ 

Brockwell  and  Davies  (1987),  Section  8.6,  gave  what  corresponds  to  our  analysis  in  (8.41) 
-  (8.46)  of  this  section,  based  on  a  different  approach. 

We  complete  the  details  by  using  a  standard  argument  in  the  operation  with  mul¬ 
tivariate  normal  densities.  The  likelihood  function  is  obtained  by  considering  the  joint 

normal  density  of  y\ . yr  as  func  ion  of  its  parameters.  A  joint  density  /(yj . yj ) 

can  be  written  as 

(8.49)  /(y j , . . . ,  yT)  =  /(yrlyr-i ,  -  •  • ,  yi )/(yr- 1  lyr-2, . . . , yi )  •  •  •  /(y2  |yi  )/(yi ), 

that  is,  a  product  of  conditional  (and  one  marginal)  densities.  In  the  multivariate  normal 
case  that  we  consider,  all  these  densities  are  normal.  Tne  expected  values  can  be  written 
as  functions  of  the  (f  —  1)- dimensional  vectors  of  covariances 

(8.50)  [  Cov(y<,  y* _  1 ),  Cov(yt,  yt_2), . . . ,  Cov(yt,  t/i )]  =  a2  [a.  0, . . . ,  0] .  t  =  2, . ..  ,T  , 

where  we  used  (2.2),  and  of  the  (t  —  1)  x  (<  —  1)  matrix  St- 1  that  contains  the  covariances 
corresponding  to  the  set  y  1, . . . ,  yt-i-  Hence, 


(8.51)  £(yt\yt-\,. « •  ,yi)  =  <rz(a,0,...  ,0)'JTt~_11 


yt- 2 


=  (a,0,...,0/F'\ 


(-i 


\  yi  / 
yt- 2 

'  yi  / 
1-1 


V''  1  j  v' ,  N,_i  1  —  a2(<  J) 

=  *2^Pt-iyt-j  =a}^(-a)J  -  -_-2  —  y<- 

;=i  7=1 


<-1 


while  for  t  =  1  we  take  this  expected  value  to  be  equal  to  yi . 
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Similarly, 

(8.52) 

Vaxi  Vt|y<-i, .  •  •  ,yi)  =  Va x(yt)  -  cr2(a,0, . . .  ,0 )’ PtZ\ 


/«\ 

0 

\o) 


=  a2(l  +  a2)  -  a2oc2p)K  =  a2(  1  +  a2)  -  <r2a2(l  -  a2'''1')/)!  -  a2t) 


1  -  a2t 

Substitution  in  (8.49)  gives 


21-o2(,+,)  2  A,  2. 

_  a ' - — —  =  a  -  =  <7zvtt. 


(8.53)  I*(a.<j2)  =  (2tt)  t/2  <72£>aa^ 


-i/2 


exp<  - 


i  T  t  r  "-1  a  ->2 

E  —  i  V' + E  (-a)J 

ua*  L  j-,  AAs-1 


2<72 


3=1 


Comparing  with  (8.44)  we  see  that  it  suffices  to  show  that  the  expression  in  brackets  equals 
w,.  From  (8.43),  by  reperted  substitutions,  we  obtain 

1  o  1 


a 


(8.54)  ws  =  y,  -  - - =  ya  -  r - y,-\  +  a2  — 


^’a-l.a  — 1 


u3  — l,s  — 1 


=  Vs  +  (-a)- — - ya-1  + - 1-  (-a)'CT - - r - Vs-k 


-Ws- 2 


+  (-«r+1* 


*>3-1, 3-1 
Jt+1 


*>3  —  l,s  —  1  '  ‘  '  vs  —  k  —  \,s  —  k  —  1 

and  the  result  fallows  because 


■  ■  ■  V9-k,  i—k 

- W„-k-\ , 


(8.55) 


Aa_i  As_2  Aa_*  A,_i 

vs— 1,3  —  1  '  '  ’  *>3  — k,s  —  k  —  -  ~  -  —  - 

A, -2  Aa-3  Aa_fc-1  Aa_fc_i 


9.  Numbers  of  Operations  Needed  to  Do  the  Calculations 

Quadratic  forms  and  traces  were  given  (in  the  frequency  domain)  in  Sections  7.2  and 
7.3,  respectively,  for  T  even  and  for  T  odd.  To  simplify  the  analysis  in  this  section  we 
consider  the  case  of  T  even,  since  we  are  interested  in  orders  of  magnitude  of  the  numbers 
of  operations. 

The  traces  f  io  and  #20  are  given  in  (7.29)  and  (7.30).  Assuming  that  d3  =  2  cos  ks/(T+ 
1)  is  available  in  the  computer,  we  calculate  <Pt  for  s  =  1, . . . ,  T/2  once  and  for  all,  and  use 


that  d s  =  —  dr+i-s  for  s  =  (T / 2)  +  1, . . .  ,T;  we  also  calculate  p2  once  for  each  iteration. 
Then  (7.29)  requires  Tf 2  multiplications  p2cPs,  T/ 2  subtractions  1  —  p2cPs,  T/2  inversions 
and  T/2  additions.  Formula  (7.30)  involves  additionally  T/2  additions  1  4-  p2d],  T/2 
multiplications  to  obtain  the  square  in  the  denominator,  T/2  divisions,  and  T/2  additions. 
For  tio  and  #20  we  have  altogether  2 T  additions  or  subtractions,  T  multiplications,  and  T 
divisions.  See  summary  table  below. 

We  next  consider  the  calculation  of  the  quadratic  forms  given  in  equations  (7.6)  to 
(7.11).  The  calculation  of  the  z3  in  (7.4)  is  about  Tlog2T  multiplications  and  additions, 
but  that  is  done  once  and  for  all.  For  T  even,  <joo  is  given  in  (7.6).  The  sums  z2  +  r^+1_s 
and  differences  z 2  —  are  calculated  only  once.  The  additional  computations  for 

one  iteration  is  T  multiplications  to  obtain  pda  and  then  pds(z 2  —  z^+l_s),  T/2  additions, 
T/2  divisions,  and  T/2  additions.  For  T  even.  q\Q  is  given  by  (7.8).  This  is  additionally  T 
multiplications,  T/2  subtractions,  T/2  divisions,  and  T/2  additions.  Thus  for  q 00  and  510 
we  have  2 T  additions  or  subtractions,  2 T  multiplications,  and  T  divisions.  For  T  even,  <720 
is  given  by  (7.10).  This  is  additionally  2 T  additions  or  subtractions,  5T/2  multiplications, 
and  T/2  divisions.  Finally,  for  q00,  q10,  and  320  we  have  4 T  additions  or  subtractions,  9T/2 
multiplications,  and  3T/2  divisions. 

The  calculations  in  the  time  domain  were  summarized  in  equations  (6.29)  to  (6.46). 
To  compute  g0o  and  g10  in  (6.43),  we  use  formulas  (6.34),  (6.35),  (6.40),  and  (6.42),  which 
were  also  given  in  Section  5.3.  Considering  as  if  we  had  T  steps  instead  of  the  T  —  1 
actually  considered  there,  they  involve  5 T  additions  or  subtractions,  4T  multiplications, 
and  3 T  divisions.  To  compute  <720  in  (6.46)  we  use  formulas  (6.44)  and  (6.45),  that  involve 
additionally  3T  additions,  3T  multiplications,  and  T  divisions. 

The  traces  tio  and  <20  are  calculated  in  (6.36),  (6.37),  (6.38),  and  (6.39).  In  (6.36) 
there  axe 

T  T- 1 

(9.1)  £>-1)=£s  =  -7XT-1) 

»=2  »  =  1 

multiplications.  Then  (6.37)  involves 

(9-2)  X^  X^J  ~  X^  2S^  +  ^  =  2  X^5  "*■  +  2)  =  o  X](g2  +  3s  +  2) 

s= 2  ;  =  1  «=2  »  =  1  *=1 

=  +  3Tlrr_D  -f  2(T  —  1)1  =  ~(T~  1)(T2  +4T  +  6) 

2  (  6  2  J  6 

multiplications  and  additions.  To  obtain  <10  =  t\0  in  (6.38)  we  need  additionally  T  -  1 
divisions  and  T  —  1  additions,  and  to  obtain  <20  =  <20  'n  (6.39)  we  need  additionally 
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_ ^  X*  i  •  i 

Y2s-2  s  =  2^  2)(T  —  1)  additions,  the  same  number  of  multiplications,  and  jT(T  —  1) 

divisions.  Thus  Do  and  £20  require  altogether  (T  —  1)(T2  +  7T  +  18)/6  additions  and 
subtractions,  (T  —  1)(T2  +  10T  +  12)/6  multiplications,  and  (T  —  1)(T  +  2)/2  divisions. 
The  number  of  operations  can  be  summarized  as  follows: 


Quantity 

Time  Domain 

Freq.  Domain 

Computed 

+/- 

X 

4- 

+/- 

X 

-r 

9oo 

4  T 

3T 

3  T 

T 

T 

T 

2 

9io 

T 

T 

— 

T 

T 

T 

2 

9oo,  9io 

ST 

AT 

3  T 

2  T 

2  T 

T 

920 

3  T 

3  T 

T 

2  T 

5  T 

2 

T 

2 

9oo,  9io,  920 

8  T 

7  T 

AT 

AT 

9  T 

2 

3  T 

2 

fio 

(T-1)(T2+4T+12) 

6 

(T-l)(T2+7r+6) 

6 

T-  1 

T 

T 

2 

T 

2 

*20 

(T— l)(T+2) 

2 

(T— l)(T+2) 

2 

T(T-l) 

2 

T 

T 

2 

T 

2 

*10, *20 

(T-1)(T7+7T+18) 

6 

(T-i)(r2+ior+i2) 

6 

rr-i)cr+2) 

2 

2T 

T 

T 

We  can  compare  the  different  procedures  by  comparing  the  number  of  computations 
per  iteration.  The  scoring  procedures  1  and  3  require  the  computation  of  9^,  ,  {['q, 

,  while  the  Newton-Raphson  procedures  require  in  addition  the  computation  of  q2o  ■  It 
will  be  seen  in  the  table  above  that  except  for  the  computation  of  the  Fourier  coefficients 
z\ , . . . ,  zt  the  number  of  computations  carried  out  in  the  frequency  domain  is  substantially 
less  than  in  the  time  domain.  In  particular,  the  number  of  operations  for  and  is  of 
the  order  T3/ 3  in  the  time  domain,  but  of  the  order  4 T  in  the  frequency  domain.  Since  the 
advantage  of  the  frequency  domain  is  in  the  calculation  of  the  traces,  which  do  not  require 
the  Fourier  transform  of  the  data,  the  efficient  calculation  by  any  of  the  procedures  is  to 
compute  the  quadratic  forms  in  the  time  domain  and  the  traces  in  the  frequency  domain. 

Of  course,  counting  the  number  of  operations  is  only  one  aspect  of  the  evaluation  of 
these  methods.  Also  relevant  axe  the  speed  of  convergence  and  the  behavior  in  medium¬ 
sized  samples. 


10.  Box-Jenkins  Procedures 


In  this  section  we  consider  the  approach  of  Box  and  Jenkins  (1976)  for  computing  the 
quadratic  form  qoo(a)  =  y' P~1y  and  its  derivative  dqoo(a)/da  for  any  given  value  of  a. 
Box  and  Jenkins  (1976)  proposed  to  estimate  a  by  minimizing  qoo(ct);  operating  with  this 
objective  function  is  different  from  maximizing  the  likelihood  function  or  minimizing  the 
concentrated  likelihood  (2.13)  with  respect  to  a  because  the  determinant  \P\  is  ignored. 
See  Box  and  Jenkins  (1976)  Chapter  7.  \ 

As  in  Section  8.2,  let  us  consider  the  transformation  from  (uq,  u')'  which  is  N( 0,  a2  Jr+i ) 
to  (uo,y')' ,  defined  now  by 


(10.1) 


u  o 

u 


=  B~l  U° 

y 


B  —  Bx+i  —  Jr+i  +  OiL 


T+l, 


where,  as  in  (8.1),  Lt+\  has  l’s  along  the  diagonal  immediately  below  its  main  diagonal 
and  0’s  elsewhere.  Let 


(10.2) 


(B-M'B-1  =  (  m°°  ™01 

v  ’  \Tn10  M  n 


where  mjo  =  .  Then  the  quadratic  form  in  the  exponent  of  the  normal  density  of 

(u0,u')'  is  —  1/(2<t2)  times 

(10.3)  (uo,«')(:)=(u.,vo(-: 


=  m  oo 


(u0  +  — moiy)  +  y'  ( 

\  m00  )  \ 


Mu - mlom0i  )  y. 

m0o  J 


Since  the  Jacobian  of  the  transformation  (10.1)  is  1,  in  the  normal  density  of  (uo,u')'  we 
can  substitute  B~1(uo,y>)'  directly  to  get  the  normal  density  of  ( u0,y')',  and  this  in  turn 
can  be  expressed  as  the  product  of  the  marginal  density  of  y  times  the  conditional  density 
of  tio  given  y.  The  quadratic  form  in  the  exponent  of  the  marginal  normal  density  of  y  is 
—  1/(2<t2)  times 


(10.4) 


9oo( 


a)  =  y' 


1 


moo 


■TTlioTTloi 


/ 


and  the  quadratic  form  in  the  exponent  of  the  conditional  normal  density  of  Uo  given  y  is 
-m0o  (uo  +  / (2<x2).  Thus 

(10.5)  £(MV)  =  f  [(““)  |  »]  = 
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because  y  =  £(y|y),  and  hence 


(10.6) 


=  y'  ^ 


Mu - m10m0i  y  =  goo(°0- 

m  oo  / 


In  conclusion,  we  have  shown  that 


(10.7)  qoo(a)  =  y'P  1y  =  y'(Mn - —  mi0m0i^j  y  =  V' [£(ut|y)]2. 

V  moo  J  “ 

To  compute  £(ut\y)  for  t  =  0, 1, . . .  ,T  we  use  the  process  ut  introduced  in  (2.1)  and 
a  process  of  independent  normal  (0,  a 2 )  random  variables  for  which 


(10.8)  yt  =  vt  -\-avi+1, 

that  is,  has  the  “time”  reversed.  Box  and  Jenkins  (1976),  Chapter  6,  call  this  the  “back¬ 
ward”  form  of  the  process. 

From  model  (2.1)  we  have  for  a  given  value  of  a  the  recursive  relations 

(10.9)  £(ut\y)  =  yt  -  a£(ut-i\y),  f  =  l,...,T, 

which  would  provide  all  needed  conditional  expectations  if  £(uo|y)  were  known.  It  turns 
out  that  we  can  obtain  £(ito|y)  from  a  recursive  relation  derived  from  (10.8),  namely 

(10.10)  £(vt\y)  =  yt  -  a£(vt+i\y),  f  =  T,...,l, 

if  we  make  the  additional  assumption  that  for  some  sufficiently  large  T*,  1  <  T*  <  T, 
£(vt •  \y)  =  0.  We  note  that  £(ttt|y)  =  0  for  f  =  0,  —1,  —2, . . .  and  for  t  =  T  +  1,  T  +  2, . . .; 
similarly,  £{vt\y)  =  0  for  t  =  0,-1,  —2, . . .  and  for  t  =  T  +  2,  T  +  3,  — 

Starting  with  £{vt‘\v)  =  0  and  using  (10.10),  we  obtain  £(ur* -l  |y),  •  ■  • ,  £{v\  |y). 
For  t  —  0  (10.10)  yields  0  =  £(yo\y)  =  a£(vi\y).  Then  use  of  (10.9)  for  t  =  0  yields 
£(n0| y)  =  £(y0|y)  =  a£(vijy),  which  is  the  desired  starting  value. 

If  more  accuracy  is  needed,  one  can  obtain  £(yr+i\y)  —  q£(ut |y)  and  £(ur+i|y)  = 
£(l/r+  i  |y)  to  begin  another  round  of  recursions. 

Finally  we  are  in  a  position  to  compute  goo(o)  =  Ylu=o  [£(u‘iv)]  ^or  given  value 
of  a.  The  analysis  of  qoo(oe)  is  illustrated  in  Box  and  Jenkins  (1976),  Section  7.1,  where  it 
is  denoted  by  S(6)  in  general. 
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Suppose  that  o-r,  is  an  initial  value  of  a,  and  let  £(ttt|y,a0)  denote  the  value  of  the 
conditional  expectation  of  ut  given  y  calculated  for  this  value  a0-  For  any  a  we  can 
approximate  £(ut\y,a )  as 


(10.11) 

and  qoo{a)  as 


£(ut\y,a)  ~  £(ut\y,a0)  + 


d£(ut\y,a) 

da 


(a  -  a0), 

a=a0 


(10.12) 


Qoo(a)  ~  i  £(ut\y,a0)  + 

<= o  l 


d£(ut\y,a) 

da 


(a  -  a0) 

a=a0 


2 


Minimization  of  (10.12)  with  respect  to  a  occurs  at 


(10.13) 


a  — a0  = 


e(  |  d£(ut\y,a) 

£<=0£(u<  lv»®o)  d* 

a=a0 

yt 

/Lit= o 

’ d£(ut\y,a ) 

Qr  =  a  o 

2 

da 

From  (10.9)  and  (10.10)  we  obtain 
(10.14) 


da 


da 


<  =  1  T 

t  X,  .  .  .  ,  J.  , 


(10.15) 


d£MyK-.S(v,+l\y)-ad-B^lM,  tm  a . r. 


da 

since  y*,  t  =  1, . . . ,  T  does  not  depend  on  a.  Further,  we  need 

d£(tx0|y)  d£(y0|y) 


(10.16) 


da 


da 


since  £(u_i|y)  =  £(u_2|y)  =  •••  =  0.  To  calculate  (10.16)  we  use  an  approximation 
obtained  by  calculating  (10.15)  recursively  from  some  T*  replacing  d£(vr-  \y)/da  by  0. 
This  leads  to 


(10.17) 


d£{v 0Jy)  d£(y0|y)  d£(v i|y)  c(  ,  N 

0  =  - : - =  - : - a - ; - £{vi\y), 


da 


da 


da 


which  is  solved  for  d£(yojy)/da.  Thus  we  obtain  the  constituents  of  (10.13). 

The  Taylor-series  expansion  is  considered  by  Box  and  Jenkins  (1976)  in  Section  7.2. 
As  in  Section  9  we  can  now  count  the  number  of  operations  needed  to  do  the  calcu¬ 
lations.  These  can  be  summarized  as  follows 
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Quantity 

Computed 

+/- 

X 

T+  1 

T+  1 

T+  1 

T+  1 

9oo(a) 

T 

r  + 1 

dS(vtly) 

da 

T+  1 

T  +  l 

d£(ut\y) 

da 

T  +  1 

T  +  1 

dqoo(a) 

da 

T+  1 

T+  1 

Total 

6T  +  5 

tT  -f  6 

It  should  be  emphasized  that  the  minimization  of  900(a)  is  not  the  same  as  the  maxi¬ 
mization  of  the  loglikelihood  because  of  the  factor  log  \P\  —  log(l  —  o2T+2  )/(l  —  a2).  Even 
for  T  so  large  that  a2T+ 2  is  negligible  the  term  log(l  —  a2)  ~  —a2  may  not  be  small  enough 
to  ignore.  Each  procedure  studied  in  detail  in  this  paper  is  exactly  maximum  likelihood 
in  the  sense  that  the  iteration  is  meant  to  converge  to  a  local  maximum.  Therefore,  these 
iterative  maximum  likelihood  procedures  are  not  directly  comparable  to  the  Box-Jenkins 
procedures. 
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